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Does ^^-minimization outperform -minimization? 

Le Zheng Arian Maleki^, Haolei Weng^, Xiaodong Wang^, Teng Long^ 


Abstract 

In many application areas ranging from bioinformatics to imaging, we are faced with the following question; 
can we recover a sparse vector Xo G from its undersampled set of noisy observations y G K", y = Axo + w. 
The last decade has witnessed a surge of algorithms and theoretical results to address this question. One of the most 
popular algorithms is the £p-regularized least squares given by the following formulation: 

1 2 

i( 7 ,p) e argmin- \\y - AxW^ +-f\\x\\P, 

X ^ 

where p S [0,1]. Among these optimization problems, the case p = 1, also known as LASSO, is the best accepted in 
practice, for the following two reasons: (i) thanks to the extensive studies performed in the helds of high-dimensional 
statistics and compressed sensing, we have a clear picture of LASSO’s performance, (ii) it is convex and efficient 
algorithms exist for finding its global minima. 

Unfortunately, neither of the above two properties hold for 0 < p < 1. However, they are still appealing because 
of the following folklores in the high-dimensional statistics: (i) x{'y,p) is closer to Xo than a;(7, 1). (ii) If we 
employ iterative methods that aim to converge to a local minima of argmin^^ Wv ~ AxW^ + 7 ||a;||^, then under 
good initialization, these algorithms converge to a solution that is still closer to Xo than 5;(7, 1). In spite of the 
existence of plenty of empirical results that support these folklore theorems, the theoretical progress to establish 
them has been very limited. 

This paper aims to study the above folklore theorems and establish their scope of validity. Starting with 
approximate message passing (AMP) algorithm as a heuristic method for solving fp-regularized least squares, 
we study the following questions: (i) what is the impact of initialization on the performance of the algorithm? (ii) 
when does the algorithm recover the sparse signal Xo under a “good” initialization? (iii) when does the algorithm 
converge to the sparse signal regardless of the initialization? Studying these questions will not only shed light on the 
second folklore theorem, but also lead us to the answer of the first one, i.e., the performance of the global optima 
i(7,p). For that purpose, we employ the replica analysi^to show the connection between the solution of AMP and 
x{'y,p) in the asymptotic settings. This enables us to compare the accuracy of x{'j,p) and x{'y, 1). In particular, we 
will present an accurate characterization of the phase transition and noise sensitivity of fp-regularized least squares 
for every 0 < p < 1. Our results in the noiseless setting confirm that £p-regularized least squares (if 7 is tuned 
optimally) exhibits the same phase transition for every 0 < p < 1 and this phase transition is much better than that 
of LASSO. Furthermore, we show that in the noisy setting, there is a major difference between the performance 
of £p-regularized least squares with different values of p. For instance, we will show that for very small and very 
large measurement noises, p = 0 and p = 1 outperform the other values of p, respectively. 

Index Terms 

Compressed sensing, fp-regularized least squares, LASSO, non-convex penalties, approximate message passing, 
state evolution, replica analysis. 


I. Introduction 

A. Problem statement 

Recovering a sparse signal Xq E from an undersampled set of random linear measurements y = Axq -|- m is 
the main problem of interest in compressed sensing (CS) ||T|, Q. Among various schemes proposed for estimating 
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Xo, ^p-regularized least squares (LPLS) has received attention for its proximity to the “intuitively optimal” (.q- 
minimization. LPLS estimates Xq hy solving 
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x(7,p) G argmin- \\y - ^x||2 + 7 l| 3 ;||p, 

X ^ 


( 1 ) 


where H-Hp (0 < p < 1 ) denotes the ^p-norm 7 G ( 0 ,00) is a fixed number, and 7||x||p is a regularizer that 
promotes sparsity. The convexity of this optimization problem for p = 1 has made it the most accepted and the best 
studied scheme among all LPLSs. However, it has always been in the folklore of compressed sensing community 
that solving Q for p < 1 leads to more accurate solutions than the £1-regularized least squares, also known as 
LASSO, since ||x||p models the sparsity better |[^-|[T^. Inspired by this folklore theorem, many researchers have 
proposed iterative algorithms to obtain a local minima of the non-convex optimization problem ([T]) with p G [ 0 , 1 ) 

0, mg. 


The performance of such schemes is highly affected by their initialization; better initialization increases the 
chance of converging to the global optima. One popular choice of initialization is the solution of LASSO | [T 7 | . 
This initialization has been motivated by the following heuristic: The solution of LASSO is closer to the global 
minima of ([T]) than a random initialization. Hence it helps the iterative schemes to avoid stationary points that are 
not the global minima of LPLS. Ignoring the computational issues, one can extend this approach to the following 
initialization scheme: Suppose that our goal is to solve o for p = po- Define an increasing sequence of numbers 
Po < Pi < • • • < Pg = 1 for some q. Start with solving LASSO and then use its solution as an initialization 
for the iterative algorithm that attempts to solves ([T]) with Pq-i. Once the algorithm converges, its estimate is 
employed as an initialization for Pq-2- The process continues until the algorithm reaches po. We call this approach 
p-continuation. 

Here is a heuristic motivation of the p-continuation. Let x{'y,pi) denote the global minimizer of f ||y — Ax\\\ + 
7||x||p’. Since pi and p^+i are close, we expect x{'y,pi) and x(7,pi+i) to be “close” as well. Hence if the algorithm 
that is solving for x{'y,pi) is initialized with x(7,pi+i), then it may avoid all the local minima and converge to 
x(7,pi). Simulation results presented elsewhere confirm fhe efficiency of such initialization algorithms fl^, 1181 . 


We can summarize our discussions in the following three folklore theorems of compressed sensing: 

(i) The global minima of ([T]) for p < 1 outperforms the solution of LASSO. Furthermore, smaller values of p 
lead to more accurate estimates. 

(ii) There exist iterative algorithms (ITLP) capable of converging to the global minima of 0 under “good” 
initialization. 

(iii) p-continuation provides a “good” initialization for ITLP. 

Our paper aims to evaluate the scope of validity of the above folklore beliefs in the asymptotic settings]^ Toward 
this goal, we first study a family of message passing algorithms that aim to solve Q; we characterize the accuracy 
of the estimates generated from the message passing algorithm under various initializations, including the best 
initialization obtained by p-continuation. We finally connect our results for the message passing algorithm estimates 
to the analysis of global minima x(7,p) of 0 by Replica method. Here is a summary of our results explained 
informally: 

(i) If the measurement noise w is zero or small, then the global minima of 0 for p < 1 (when 7 is optimally 
picked) outperforms the solution of LASSO with optimal 7. Furthermore, all values of p < 1 have the same 
performance when w = 0. When w is small, LPLS with the value of p closer to 0 has a better performance. 
However, as the variance of the measurement noise increases beyond a certain level, this folklore theorem is 
not correct any more. In other words, for large measurement noise, the solution of LASSO outperforms the 
solution of LPLS for every 0 < p < 1. 

(ii) We introduce approximate message passing algorithms that are capable of converging to the global minima of 
0 under “good” initialization (in the asymptotic settings). We call these algorithms Ip-KMP. 

^The ^p-norm of a vector x = [x\,X2, ■ ■ ■ is defined as ||a:||p = 

^Parts of our results that are presented in Section]^ are based on the replica analysis |[ 0 . Replica method is a non-rigorous but widely 
accepted technique from statistical physics for studying large disordered systems. Hence the results we will present in Section |V] are not 
fully rigorous. 
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(iii) The “performance” of the message passing algorithm under p-continuation is equivalent to the “performance” 
of message passing algorithm for solving Q with the best value of p. As a particular conclusion of this result, 
we note that p-continuation can only slightly improve the phase transition of ^i-AMP. p-continuation is mainly 
useful when the noise is low and Xq has very few non-zero coefficients. 

There has been recent efforts to formally prove some of the above folklore theorems. We briefly review some 
of these studies and their similarities and differences with our work below. Among the three folklore results we 
have discussed so far, the first one is the best studied. In particular, many researchers have tried to confirm fhat af 
least in the noiseless settings {w = 0), the global minima of ([T]) for p < 1 outperforms the solution of LASSO. 
Toward this goal, |19|, |20| have employed some popular analysis tools such as the well-known restricted 

isometry property and derived the conditions under which ([T]l recovers Xq accurately. We briefly mention fhe resulfs 
of Q fo emphasize on fhe sfrengfhs and weaknesses of this approach. Let the elements of A be iid A^(0,1) and 
y = Axo, where Xo is fc-sparse, meaning it has only k nonzero elements. If n > Ci(p)A: -|-pC' 2 (p)A: log then the 
optimization problem 

subject to y = Ax 


mm X 


recovers Xo with high probability. Furthermore, Ci (p) and pC 2 (p) are increasing functions of p. The lower bound 
derived for the required number of measurements decreases as p decreases. This may be an indication of the fact 
that smaller values of p lead to better recovery algorithms. However, note that this result only offers a sufficient 
condition for recovery and hence any conclusion drawn from such results on the strengths of these algorithms may 
be misleading]^ 

To provide more reliable comparison among different algorithms, many researchers have analyzed these algorithms 
in the asymptotic setting N ^ oo (while e = k/N and 5 = n/N we fixed) |Q, | [l^ , | fl5| , |211. This is the framework 
that we adopt in our analysis too. We review these four papers in more details and compare them with our work. 
Stojnic and Wang et al. Q, consider the noiseless setting and try to characterize the boundary between the 
success region (in which Q recovers Xq exactly with probability one) and the failure region. This boundary is 
known as the phase transition curve (PTC) The characterization of PTC in Q is only accurate for the case p = 0. 
Also, the analysis of |21| is sharp only for <5 —)• 1. Our paper derives the exact value of PTC for any value of 
0 < p < 1 and any value of 5. Furthermore, we present accurate calculation of the risk of x( 7 ,p) in the presence 
of noise and compare the accuracy of x(j,p) for different values of p. However, unlike Q and pT| , part of our 
analysis, presented in Section jVj is based on the Replica method and they are not fully rigorous yet. Note that all 
the results we present for approximate message passing are rigorous and we only employ Replica method to show 
the connection between the solution of AMP and x{'y,p). 

Replica method has been employed for studying Q in dl, | [T5| to derive the fixed point equations that describe 
the performance of x( 7 ,p) (under the asymptotic settings). These equations are discussed in Section]^ To provide 
fair comparison of the performance of x( 7 ,p) among different p, one should analyze the fixed points of these 
equations under the optimal tuning of the parameter 7 . Such analysis is missing in both papers. In this paper, by 
employing the minimax framework, we are able to analyze the fixed poinfs and provide sharp characterization of the 
phase transition of ([T]) and its noise sensitivity for the first time. In addition, we present algorithms whose asymptotic 
behavior can be characterized by the same fixed point equations as the ones derived from Replica method. The 
minimax framework enables us to analyze the stationary points at which the algorithm may be trapped and derive 
conditions under which the algorithm can converge to the global minimizer of ([T]l. 

As a final remark, we should emphasize that, to the best of our knowledge, the second and third folklore results 
have never been studied before and our results may be considered as the first contribution in this direction. 


B. Message passing and approximate message passing 

One of the main building blocks of our analysis is the approximate message passing (AMP) algorithm. AMP 
is a fast iterative algorithm proposed originally for solving LASSO |22|. Starting from = y and x^ = 0, the 


Note that even though we have mentioned the results for iid Gaussian matrices, it can be easily extended to many other measurement 
matrix ensembles. 

^There are some subtle discrepancies between our definition of the phase transition curve and these two papers’. However, our results are 
more aligned and comparable with the results of l4). 
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algorithm employs the following iteration: 


X* = ^ + ^Xt), 

/ = y - Ax^ + {rj[{A 




( 2 ) 


where x* is the estimation of Xo at iteration t and 5 = ^. Furthermore, for a vector u = [ui, ...,U]sf] , (u) = 


Af 


Ui/N. r]i{u; X) is the soft thresholding function defined as rji{u', X) = (|tt| — A) sign(tt)I (|rt| > A) with I(-) 

i=l 

denoting the indicator function. A is called the threshold parameter. r][ denotes the derivative of r/i, i.e., rj[{u; A) = 
. When ri is a vector, rii{u] A) and r][{u; A) operate component-wise. In the rest of the paper, we call this 
algorithm £i-AMP. It has been proved that if the entries of A are iid Gaussian, then in the asymptotic settings, the 
limit of X* corresponds to the solution of LASSO for a certain value of A [231, |24|. 

First, we extend ^i-AMP to solve LPLS defined in ([T]). Iteration of f'l-AMP have been derived from the first 
order approximation of the -message passing algorithm ||25| given by 


= Vi 


= Va 


E 

h^a 




E4 


ajX- 


t-1 


(3) 


where x*_^^ and (f G {1,2..., N} and a G (1,2,...,n}) are 2nN variables that must be updated at every 
iteration of the message passing algorithm. Compared to this (full) message passing, AMP is computationally less 
demanding since it only has to update n + N variables at each iteration. It is straightforward to replicate the 
calculations of p5| for a generic version of LPLS to obtain the following message passing algorithm: 

= ya-'^AajX^-^^. ( 4 ) 

Here rjpiu; A) = argmin^^, — x||| -|- A||x||p is known as the proximal function for A||x||p. It is worth noting that 
for p = 1, r/i (tt; A) is the soft thresholding function introduced in £i-AMP, and for p = 0, rjoiu; A) = > V^) 

is known as the hard thresholding function. For the other values of p G (0,1), r]p{u\X) does not have a simple 
explicit form, but it can be calculated numerically. Figure [T] exhibits rjp for different values of p. Note that all these 
proximal functions map small values of u to zero and hence promote sparsity. Because of the specific shape of 
these functions, we may interchangeably call them threshold functions. 

Note that iterations of Q are computationally demanding since they update 2nN messages at every iteration. 
Therefore, simplification of this algorithm is vital for practical purposes. One simplification that is proposed in 
| [^ (and has led to AMP) argues that z^- = zl + + 0(1/N} and xYi^ = x- -t- xj^b + where 

Q^i,xj^b ~ ^(3/\/™)- Under this assumption, one may use a Taylor expansion of pi in ([5]) and obtain Q. 

If Pp(-) were weakly differentiable, the same simplification could be applied to Q. However, according to Figure 
[T| pp(-) is discontinuous for p < 1. This problem can be resolved by one more approximation of the message passing 
algorithm. In this process, we not only approximate xY^ and but also approximate Pp(-) by a smooth function 
fjp^h constructed in the following way. We first decompose r]p{u; A) to 


where 


r]p{u] A) = Sp{u; A) -f Dp{u; A), 

' Tlp{u;X) -ii~{-X-,X), ifu<-X, 

0, if — X ^ u ^ X, 

^ 7/p(n;A)-7/+(A;A), if u > A, 


' hp (-A; A), if u < -A, 
0, if — A ^ tt ^ A, 

, (A; A), if > A. 


(5) 


Sp{u-, A) = < 


Dp(u] A) — < 


( 6 ) 
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Here A represents the threshold helow which r]p{u,X) = 0. The exact form of A will he derived in Lemma 
Furthermore, 


r/ (-A;A) = lim_ r/p(u; A), 

u/'-\ 

- lim A), 

u\X 

where and \ denote convergence from left and right respectively. Sp{u] A) is a weakly differentiable function, 
while Dp{u\ A) is not continuous. Let Gh denote the Gaussian kernel with variance > 0. We construct the 
following smoothed version of rjp-. 

fip^h{u-,X) = Sp{u-,X) + Dp^h{u]X), (7) 

where Dp^h{u', A) = Dp{u; A) * Here * denotes the convolution operator. If we replace r]p{-) with fjp^hi') 

in we obtain a new message passing algorithm: 

~ f At^ ; 

b^a 

— Ua ~ AgjX^^^, ( 8 ) 


where h is assumed to be “small” to ensure that replacing rjp with fjp^h does not incur major loss to the performance 
of the message passing algorithm. We discuss practical methods for setting h in the simulation section. Since fjp^hi') 
is smooth, we may apply the approximation technique proposed in 1251 to obtain the following approximate message 
passing algorithm: 


X* = fjp^h{A^z^ ^;At), 

= y- Ax^+ z^~^^{fi'ph{A^z^~^+X^~^]Xt)) . (9) 



Fig. 1. rjp (m; A) for 4 different values of p. A is set to 1. 

We call this algorithm £p-AMP. If we define u* = + x* — Xo, then we can write x^~^^ = rjpA^o + A). 

One of the main features of AMP that has led to its popularity is that for large values of n and N, looks like 
a zero mean iid Gaussian noise. This property has been observed and characterized for different denoisers in Q, 
||^-1[^ and has also been proved for some special cases in p7| , | [3T| . Since this key feature 
plays an important role in our paper, we start by formalizing this statement. 

Let n, A —)• oo while 6 = ^ is fixed. In the rest of this section only, we write the vectors and matrices as 
Xo{N), A{N), y{N), and w{N) to emphasize dependence on the dimensions of Xq. Clearly, matrix A has 6N rows. 


“This smoothing is also proposed for the hard thresholding function in |26| for a different purpose. 
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but since we assume that S is fixed, we do not include n in our notation for A. The same argument is applied to 


y(N) and w{N). The following definition adopted from |31| formalizes the asymptotic setting in which f'p-AMP 
is studied. 


Definition 1. A sequence of instances {xo{N), A{N),w{N)'\ is called a converging sequence if the following 
conditions hold: 

- The empirical distribution of Xo{N) G converges weakly to a probability measure px with bounded second 
moment. Further, ^||xo(iV )||2 converges to the second moment of px- 

- The empirical distribution of w{N) G (n = 6N) converges weakly to a probability measure N{0,a‘^). 
Furthermore, -||t(;(iV)||| converges to 

- Aij ~ N{0,1/n). 

The following theorem not only formalizes the “Gaussianity” of u*, but also provides a simple way to characterize 
its variance. 


Theorem 1. Let {xo{N), A(N),w{N)} denote a converging sequence of instances. Let x^{N,h) denote the 
estimates provided by ip-AMP according to Q- Let hi,h 2 ,... denote a decreasing sequence of numbers that 
satisfy hi > 0 and hi ^ 0 as i ^ oo. Then, 


lim lim 

i^oo N^OO 


|x'+i(iV, h^)-XoiN)\\ 


2 


N 


= E 


+ CTtZ ; At) — X\ 


where at satisfies the following iteration: 

^t+i = + At) - . (10) 

Here the expected value is with respect to two independent random variables Z r-u N{0, 1) and X ~ 

Note that (t| only depends on a^_i and the selected 


The proof of this statement is presented in Section 


VI-C 


threshold value at iteration t — 1. This important feature of AMP will be used later in our paper, at and the relation 
between at and at-i are called state of £p-AMP and state evolution, respectively. 


C. Summary and organization of the paper 

In this paper, we consider Ip-XMP as a heuristic algorithm for solving ("p-minimization and analyze its perfor¬ 
mance through the state evolution. We then use Replica method to connect the £p-AMP estimates to the solution 
of ([T]). Our analysis examines the correctness of all folklore theorems discussed in Section The remainder of 
this paper is organized as follows: Section introduces the optimally tuned £p-AMP algorithm and the optimal 
p-continuation strategy. Sections [III] and [TVl formally present our main contributions. Section V] discusses our results 
and their connection with the -regularized least squares problem defined in ([T]l. SecfionjVT is devofed fo fhe proof 
of our main contribufions. Section |VII| demonsfrafes how we can implement the optimally tuned ^p-AMP in practice 
and studies some of the properties of this algorithms. Section VIII concludes the paper. 


II. Optimal £p-AMP 

A. Roadmap 

The performance of £p-AMP depends on the choice of the threshold parameters Xt. Any fair comparison between 
£p-AMP for different values of p must take this fact into account. In this section we start by explaining how we 
set the parameters A*. Then in Section III we analyze ^p-AMP. 


^ ctq depends on the initialization of the algorithm. If £p-AMP is initialized at zero then (Jq 


2 _ E(X^) 

s ■ 
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B. Fixed points of state evolution 

According to the state evolution in ( fTO] ), the only difference among different iterations of f'p-AMP is the standard 
deviation at- In the rest of the paper, for notational simplicity, instead of discussing a sequence of threshold values 
for ^p-AMP, we consider a thresholding policy that is defined as a function A((t) for a > 0. Given a thresholding 
policy, we can run £p-AMP in the following way: 




+ x* A((Tt_i))) . 


( 11 ) 


In practice, is not known, hut can he estimated accurately |25|. We will mention an estimate of cr/ in the 
simulation section. Note that hy making this assumption, we have imposed a constraint on the threshold values. 
In Section II-E| we will show that for the purpose of this paper considering thresholding policies only, does not 
degrade the performance of £p-AMP. 


According to Theorem [T| the performance of £p-AMP in ( [TT] ) (in the limit hi 
following state evolution: 

1 , 


^t+i — 


al + -E (\rjp{X + atZ; X{at)) - X\^) 

Inspired hy the state evolution equation, we define the following function: 


0) can he predicted hy the 


( 12 ) 


= ^^2 + 1e (|r?p(X + aZ; A(a)) - Xp) 


( 13 ) 


It is straightforward to confirm that the iterations of ( fT^ converge to a fixed point of There are a few 

points that we would like to highlight about the fixed points of 'I'A,p(cr^): 

(i) 'kA,p(c^) usually has more than one fixed point. If so, the fixed point f'p-AMP converges to depends on the 

(ii) 


initialization of the algorithm. This is depicted in Figure j^a). 

Lower fixed points correspond to better recoveries. To see this, consider the two fixed points cj/ and in 
Figure |^a). Call the corresponding estimates of AMP and According to Theorem the mean 

square errors of these two estimates (as X —)• oo and /i —)• 0) converge to E (^rip{X + cr/^Z; A((T/J) — Xf'^ 

and E (^\pp{X + a pZ] X{af^)) — respectively. Furthermore, note that since both of them are fixed points 


we have 


cr. 


+ ^E{\ppiX + apZ-X{ap))-X\^) = 




/ ^2 _ 2 
< ~ 


+ ^e{\7Jp{X + apZ-X{ap)) - X\^) . 


Hence 


E 


{\rjp{X + apZ; A(ayJ) - Xf) < E {\r^p{X + apZ; X{ap)) - X\^) . 


Therefore, the lower fixed points lead to smaller mean square reconstruction errors. 

(iii) Two of the fixed points of 'kA,p(<7^) are of particular interest in this paper: (1) The lowest fixed point: this fixed 
point indicates the performance one can achieve from £p-AMP under the best initialization. As we will discuss 
later this fixed point is also related to the solution of LPLS. (2) The highest fixed point: the performance 


£p-AMP exhibits under the worst initialization. 


(iv) 


The shape of 'kA,p and its fixed points depend on the distribution px- In this work we study px G Xe, where 
Fe denotes the set of distributions whose mass at zero is greater than or equal to 1 — e. In other words, X ~ px 
implies that P{X / 0) < e. This class of distributions has been studied in many other papers | [T5| , |22|, | j^ , 
and is considered as a good model for exactly sparse signals. 

Before discussing the optimal thresholding policy, we should distinguish between three types of fixed points: (i) 
stable, (ii) unstable, (iii) half-stable. The following definitions can be used for any function of but we introduce 
them for 'kA,p(<7^) to avoid introducing new notations. 


Definition 2. ay is called a stable fixed point of 'I'a,p((T^) if and only if there exists an open interval I, with 
Of G I, such that for every a > aj in I, 'I'a,p(o'^) < and for every a < aj in I, 'kA,p(o'^) > We call 0 a 









Fig. 2. The shapes of and its fixed points, (a) If AMP is initialized at (Jq = crf^, then limt_>oo = cr^^. However, if ctq = 

then limt-Kxj According to Definitionsandcr/j and cr/j are stable fixed points, while af^ is the unstable fixed point, (b) 

is a half-stable fixed point: The algorithm will converge to this fixed point, if it starts in its right neighborhood. Here cr/j is again a stable 
fixed point. 


Stable fixed point o/ if and only if 'itx^p{0) = 0 and there exists (Jj > 0 such that for every 0 < a < fjj, 

In Figure l^a), both and are stable fixed points, while is not stable. The main feature of a stable 
fixed point is the following: There exists a neighborhood of aj in which if we initialize ^p-AMP, it will converge 
to (7/0 

Definition 3. ct/ is called an unstable fixed point o/T'a,p(c7^) if and only if there exists an open interval I, with 
Uf ^ I such that for every a > aj in I, 'Fa,p(<t^) > (7^ and for every a < aj in I, 'I'A,p(c7^) < C7^. We call 0 
an ustable fixed point o/'I’ a,p(c7^) if and only t/'FA,p(0) = 0 and there exists ai such that for every 0 < C7 < c7j, 


Note that the state evolution equation will not converge to an unstable fixed poinf unless it is exactly initialized 
at that point. Hence, in realistic situations, £p-AMP will not converge to unstable fixed points. 

Definition 4. A fixed point is called half-stable if it is neither stable nor unstable. 

See Figure |^b) for an example of a half-stable fixed poinf. Half stable fixed points occur in very rare situations 
and for very specific noise levels c7^. 

C. Optinial-X ip-AMP 

In fhe last section, we discussed the role of the fixed points of 'I'a,p(( 7^) on the performance of £p-AMP. Note 
that the locations of the fixed points of T'a,p(<t^) depend on the thresholding policy A((7). Hence, it is important 
to pick X{a) optimally. Consider the following oracle thresholding policy: 

A*(c 7) G argniinE (^\r]p{X + aZ] A) — , (14) 

where the expected value is with respect to two independent random variables X ~ px and Z ^ N{0, 1). A*(c7) 
is called oracle thresholding policy, since it depends on px that is not available in practice. In Section |VII[ we 
explain how this thresholding policy can be implemented in practice. The following lemma is a simple corollary 
of our definition. 


'Note that all the statements we make about AMP are concerned with the asymptotic settings. 
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Fig. 3. Comparison of for p = 0, 0.3, 0.6, 0.9,1. The undersampling factor and sparsity are S = 0.1 and e = 0.02, respectively. 

The non-zero entries are ±1 equiprobable. 


Lemma 1. For every thresholding policy X{(t), we have 

Hence, both the lowest and highest stable fixed points of are below the corresponding fixed points of 

^A,p(a2). 

The proof of the above lemma is a simple implication of the definition of oracle thresholding policy in ( fTT] ) and 
is hence skipped here. According to this lemma, the oracle thresholding policy is an optimal thresholding policy 
since it leads to the lowest fixed poinf possible. In the rest of the paper, we call A*((t) the optimal thresholding 
policy. Also, the ^p-AMP algorithm that employs the optimal thresholding policy is called optimal-X £p-AMP. The 
optimal thresholding policy can be calculated numerically. Figure exhibits '1 'a.,p(<t^) for p = 0, 0.3,0.6,0.9,1 
when the nonzero entries of the sparse vector Xo are ±1 with probability 0.5. It turns out that 'I'a.,p(<t^) has at 
least one stable fixed point. The following proposition proves this claim. 

Proposition 1. 'I'a.,p(o'^) have at least one stable fixed point. 


The proof of this statement is presented in Section VI-D 


D. Optinial-(p, X) ip-AMP 

In the last section, we fixed p and optimized over the threshold parameter Xt. However, one can also consider 
p G [0,1] as a free parameter that can be tuned at every iteration. This extra degree of freedom, if employed 
optimally, can potentially improve the performance of ^p-AMP. To derive the optimal choice of p, we first extend 
the notion of thresholding policy to adaptation policy. The adaptation policy is defined as a tuple {X{a),p{a)) 
where A : [0, oo) —> [0, oo) and p : [0, oo) —> [0,1]. 

Given an adaptation policy, one can run the £p-AMP algorithm whose performance in the asymptotic setting can 
be predicted by the following state evolution equation: 

= + + ^tZ; X{at)) -X\^). (15) 

Hence the state evolution converges to one of the fixed points of 'I'a((t),p((t)(c’'^)- Adaptation policy can potentially 
improve the performance of the £p-AMP algorithm. In this paper, we consider the following oracle adaptation 
policy: 

(A*(<T),p*(cr)) G argininE + aZ] X{a)) - X\^'^ . 


(16) 
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Fig. 4. Comparison of (a) ^A,,o(r’'^), (b) 5'A,,i(cr^) and (c) ^'a,,p, 5 and e are set to 0.1 and 0.02, respectively. For small values 
of (T, p = 0 is optimal and for large values of ct, p = 1 is optimal. These two observations will be formally addressed in Proposition 
and|3 


Note that obtaining (A*((t), p*(cr)) requires the knowledge of px- We show how a good estimate of (A*(ct),p*((t)) 
can be obtained without any knowledge of px in Section VII The £p-AMP algorithm that employs (A*(ct),p*((t)) 


is called optimal-{p, X) £p-AMP. The following lemma clarifies this terminology: 

Theorem 2. For any adaptation policy {X{a),p{a)), we have 

The proof is a simple implication of ( 1^ I and is hence skipped. According to Theorem the oracle adaptation 
policy is optimal and it outperforms every other adaptation policy. Hence we call it optimal adaptation policy. Note 
that in all situations, the optimal-(p, A) £p-AMP outperforms the optimal-A .^p-AMP (for any 0 < p < 1). In the 
next two sections, we characterize the amount of improvement that is gained from the optimal adaptation policy. 

In this paper, we analyze the performance of £p-AMP with optimal thresholding and adaptation policies. We will 
then employ the Replica method to show the implications of our results for LPLS. 


E. Discussion about thresholding policy and adaptation policy 

Starting with an initialization, one may run £p-AMP with thresholds Ai, A2, .... until the algorithm converges. Af 
may depend on not only at, but also the entire information about T'. In that sense, it is conceivable that one may 
pick the threshold in a way that he/she can beat £p-AMP with optimal thresholding policy. Suppose that the lowest 
stable fixed point of 'kA.,p(<T^) is denoted with af. Also, suppose that 'kA.,p(<T^) does not have any unstable fixed 
poinf below a^. Consider an oracle who runs £p-AMP wifh a good initializafion (whafever he/she wants) and picks 
a converging sequence Ai, A 2 ,... for the thresholds. Assume that the corresponding sequence of at converges to 
a 00 . It is then straightforward to show that no matter what threshold the oracle picks, he/she ends up with a 00 > cr^. 
Hence, the lowest fixed poinf of specifies the best performance £p-AMP offersj^ 

Similarly, consider an oracle who runs £p-AMP with a good choice of (pi, Ai), (p 2 , A 2 ),.... Again we can argue 
that if at —)• CToo, then (Too > cr^, where ai denote the lowest fixed poinf of ((t^). Nofe fhaf considering p as a 

free parameter and changing it at every iteration can be considered as a generalization of the continuation strategy 
we discussed in the introduction. Hence, ai reflects the best performance any continuation strategy may achieve. 

®The optimal thresholding policy has many more optimality properties, if rjp satisfies the monotonicity property. For more information 
about monotonicity and its implications refer to (30| . We believe rjp satisfies the monotonicity property, but have left the mathematical 
justification of this fact for future research. 
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TABLE I 

Summary of our findings in both noiseless and noisy settings 


Noiseless setting (ctu, = 0) 

Noisy setting (ct™ > 0) 

Phase transition curve for highest fixed point of 
optimal-A fp-AMP under least favorable distribution: 
Mp(e) = 5 (0 < p < 1, Theorem 3) 

Mi(e) = 5 (p — 1, Proposition 2) 

Conclusion: minor improvement of ip over £i. 

Noise sensitivity for highest fixed point in 
optimal-A fp-AMP under least favorable distribution: 

(0 <P< 1, Theorem 10) 

1 -5- 

Conclusion: minor improvement of ip over £i. 

Phase transition curve for lowest fixed point in 
optimal-A fp-AMP: 
e = 6 (0 < p < 1, Theorem 4) 

Mi(e) = S (p — 1, Proposition 2) 

Conclusion: major improvement of ip over £i. 

Noise sensitivity for lowest fixed point in optimal-A ^p-AMP: 

limo-„^o (0 < p < 1, Theorem 6) 

^ S 

lima„^o ^ = —Hp(i)- (P = 1, Theorem 7) 
Conclusion: major improvement of ip over ii. 

Phase transition curve for highest fixed point in 
optimal-(p, A) fp-AMP under least favorable distribution: 
info<p<i Mp{e) = S (Theorem 5) 

Conclusion: minor improvement over ii. 

Noise sensitivity for highest fixed point in 
optimal-(p, A) ^p-AMP under least favorable distribution: 

limo-„^o ^ = jf-r (Theorem 11) 

2 

lim^^^o (p = 1, Theorem 7) 

ra 1 ^ 

Conclusion: major improvement over £i. 


*Note that all the results shown for the highest fixed point are sharp for the least favorable distributions (based on the 
identity Mp{e) — M^(e) confirmed by our simulations). However, for some specific distribution we may observe major 
improvement of Ip over h. See Figure]^ for further information. Also we will present a more accurate analysis for the 
case when is either very small or very large. Refer to Proposition and Theorems and for further information. 


III. Our contributions in noiseless settings 


Table I summarizes all our contributions and tbe places they will appear. This section discusses our main results 
in the noiseless setting = 0. The discussion of the noisy setting is postponed until Section 
optimal-A £p-AMP. Since there is no measurement noise, the state of this system may converge to 0, i.e., at 


We start with the 
0 


as f —)• oo. If this happens, we say f'p-AMP has successfully recovered the sparse solution of y = Axq- Otherwise, 
we say f'p-AMP has failed. Depending on the under-determinacy value <5, we may observe three different situations. 


(i) has only one stable fixed point at zero. In this case, optimal-A £p-AMP is successful no matter 
where it is initialized. 

(ii) has more than one stable fixed point, but cr^ = 0 is still a stable fixed point. In this case, the 
performance of optimal-A £p-AMP depends on its initialization. However, there exist initializations for which 
£p-AMP is successful. 

(iii) 0 is not a stable fixed poinf of In such cases, opfimal-A f'p-AMP does nof recover fhe right solution 

under any initialization. 


These three cases are summarized in Figure Our goal is to identify the conditions under which each of these 
cases happens. The following quantities will play a pivotal role in our results: 


Mp(e) = inf sup (1 - e)E(r)p(Z; r))^-h eE(r)p(/i-f Z; r) -/r)" 
T > 0^>0 L 

M (e) = sup inf [(1 - e)E(r)p(Z;r))^-heE(r)p(/i-h Z;r) - 

p > 0’'>0 L 


(17) 


where E is with respect to Z ~ A^(0,1). It is straightforward to confirm that Mp(e) > Mp(e). Our next theorem 
explains the conditions that are required for Case (i) above. 


Theorem 3. Let px G J^e- Mp(€) < 6, then the highest stable fixed point of the optimal-X state evolution happens 
at zero. In other words, 'Pa.,p(<t^) has a unique stable fixed point at zero. Furthermore, if Mp(e) > 6, then there 
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Fig. 5. Three main cases that may arise in the noiseless setting for p < 1. (a) has only one stable fixed point at zero, (b) 

has more than one stable fixed point, but = 0 is still a stable fixed point, (c) 0 is not a stable fixed point of 


exists a distribution px G J^efar which p((T^) has more than one stable fixed point. 


The proof of this theorem is summarized in Section VI-E Note that this theorem is concerned with the minimax 
framework. In other words, the minimum value of 5 for which p(iT^) has a unique fixed point at zero depends 
on Px e IFe- However, in most applications px is not known and we would like to ensure that the algorithm works 
for any distributions. Theorem [3 ensures that under certain conditions, has a unique fixed poinf for any 

Px G J^e- 

Based on Theorem]^ we can discuss the first phase transition behavior of the optimal-A £p-AMP algorithm. Let 
Px G IFe- This phase transition behavior is discussed in the following corollary. 


Corollary 1. For every 0 < p < 1 and 6, there exists e*{5) such that for every e < e*(S), has only one 

stable fixed point at zero. Furthermore, there exists e*{d) such that for every e > ep(<5), has more than 

one stable fixed point for certain distributions in 

The proof is presented in Section |VI-F[ 

Our numerical results show that ep(<5) = €*((5) holds for every 0 < p < 1. If we accept this identity, then Corollary 
[T] proves the first type of phase transition that we observe in £p-AMP; for certain distributions the algorithm switches 
from having one stable fixed point to more than one stable fixed point at £*(5). Figure exhibits e*(S) for several 
different values of p {e*{6) has the same value). While our simulation results confirm mat €*{6) = €*(5) for every 
0 < p < 1, we have only proved this observation for p = 1. 


Lemma 2. For p = 1, we have e^((5) = e^((5). 


Proof of this claim will be presented in Section VI-G Before we discuss and compare the phase transitions 


curves that are shown in Figure we discuss Case (ii) in which 
point, but zero is still a stable fixed point. 


[cr 


does not have a unique stable fixed 


Theorem 4. Let px be an arbitrary distribution in Tf,- For any 0 < p < 1, 0 A the lowest stable fixed point of 
T'a,,p(<t^) if and only if 6 > e. 


This theorem is proved in Section VI-H 
emphasize. 


There are two main features of this theorem that we would like to 


Remark 1. Compared to Theorem this theorem is universal in the sense that the actual distribution that is 
picked from does not have any impact on the behavior of the fixed point at 0. Furthermore, the number of 
measurements 6 that is required for the stability of this fixed point is the same as the sparsity level e. 


Remark 2. long as 6 > e, zero is a stable fixed point for every value of p. As we will see later in Section 
(under the assumptions of Replica method), this fixed point gives the asymptotic results for the global minimizer 
of Q. Therefore, for every 0 < p < 1, o recovers Xq accurately as long as 6 > e. This result seems to be 
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Fig. 6. The value of ep{5) as a function of 5 for several different values of p. As is clear from the figure, the improvement gained from 
£p-AMP (p < 1) is minor in the noiseless setting. Also, the values of p that are close to 1 are the only values of p that can outperform 
£i-AMP. fo-AMP performs much worse than ^i-AMP. Note that in these phase transition calculations we have assumed that we do not have 
access to a good initialization for optimal-A ^p-AMP. Hence, these phase transitions are concerned with the behavior of £p-AMP under the 
worst initializations. Theorem shows that under good initialization, optimal-A ^p-AMP outperforms optimal-A ^i-AMP by a large margin. 


counter-intuitive; if we are concerned with the noiseless settings, all ip-minimization algorithms are the same. We 
will shed some light on this surprising phenomenon in Section IV where we consider measurement noise. 


To provide a fair comparison between optimally tuned ip and £i-AMP algorithms, we study the 
optimally tuned f'l-AMP in the following theorem. This result is similar to the results proved in 
p = 1 we have already showed Mi(e) = M^(e) in the proof of Lemma in the rest of the paper 
notation Mi(e) instead. 


performance of 
|24|. Since for 
we will use the 


Proposition 2. Optimal-X ii-AMP has a unique stable fixed point. Furthermore, in the noiseless setting and for 
every px G IFe, 0 is the unique stable fixed point o/ if and only if 

6 > Mi(e), 

where Mi(e) defined in with p = 1 can be simplified to: 

Mi(e) = inf(l-e)E(r/?(Z;r)) + e(l + r2). 

r>0 


We present the proof of this proposition in Section VI-I Based on this result, we define the phase transition of 
the optimally tuned ^i-AMP. Denote 

e^((5) = sup{e : Mi(e) < J}. 


Corollary 2. In the noiseless setting, if e < €^(5), the state evolution of optimal-X ii-AMP has only one stable 
fixed point at zero for every px £ IFe- Furthermore, if e > ei(5), for every px £ IFe the fixed point at zero becomes 
unstable and it will have one non-zero stable fixed point. 


The proof is a simple implication of Proposition and is similar to the proof of Corollary [T] Hence it is skipped 
here. We can now compare the performance of optimal-A £p-AMP with optimal-A ^i-AMP. We first emphasize on 
the following points: 

(i) Optimal-A ^i-AMP has only one stable fixed point, while in general optimal-A f'p-AMP has multiple stable 
fixed points. 

(ii) In the noiseless setting, 0 is a fixed point for both optimal-A £i-AMP and optimal-A £p-AMP. The stability 
of this fixed point only depends on sparsity level e and does not depend on the specific choice of px that 
is picked from Fe. The range of the values of e for which 0 is a stable fixed point of optimal-A f'p-AMP is 
much wider than that of optimal-A ^i-AMP as shown in Figure |7] 
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Fig. 7. Comparison of the best performance of optimal-A ^p-AMP for p < 1 (under the best initialization) with optimal-A fi-AMP. The 
phase transition is the same for every p < 1. According to Replica method, the phase transition of optimal-A fp-AMP corresponds to the 
phase transition curve of the solution of 0. 


(iii) may have another stable fixed point in addition to 0 for 0 < p < 1. The value of e helow which 

has only one stable fixed poinf af zero depends on fhe disfribufion px- Theorem characferizes 
fhe condition under which for every px G zero is the unique fixed point. This specifies anofher phase 
transition for the ^p-AMP that we called e*(S) (note that in this argument we are assuming the equality 
of e*(6) = e*((i)). These phase transition curves are exhibited in Figure 1^ As is clear from the figure, for 
small values of p, the corresponding phase transition curve falls much bmow the phase transition curve of 
optimally tuned £i-AMP. For p > 0.9, some improvement can be gained from ^p-AMP, but the improvement 
is marginal. 


As is clear from the comparison of the phase transitions in Figure and Figure [7| a good initialization can lead 
to major improvement in the performance of £p-AMP for p < 1. According to Folklore Theorem (iii) mentioned in 
Section [T| we expect p-continuation to provide such initialization. Hence, we study the performance of the optimal- 

for more information on the connection of the optimal adaptation policy and 


II-E 


(p, A) ^p-AMP. Refer to Section 
p-continuation that we discussed in the introduction. 


Theorem 5. If info<p<i Mp{e) < 5, then the highest stable fixed point of optimal-{p, A) ip-AMP happens at zero. 
In other words, (<t^) has a unique stable fixed point at zero. Furthermore, if 


sup inf inf 
^> 00 < P < 1^>0 


(1 


e)E(r)p(Z; r))^ + eE(? 7 p(p + Z; r) - pf 


> 5 , 


then there exists a distribution px G d'e for which 'I'a.,p» has an extra stable fixed point in addition to zero. 


The proof of this theorem is very similar to the proof of Theorem and hence is skipped here. 

Corollary 3. T'a.,p, (cr^) has a unique stable fixed point at zero if e < supo<p<i ep((i). Furthermore, there exists 
e**{6) such that if e > e**((5), then for certain distribution px G -Tj., T'a.,p .has more than one stable fixed 
point. 


The proof of this corollary is straightforward and is skipped. Again our numerical calculations confirm fhaf 

e**{5) = sup €*(5). 

0<p<l 

Corollary 1^ has a simple implication for adapfafion policies (and also p-confinuafion). The performance of opfimal- 
(p, A) ^p-AMP is fhe same as fhe performance of opfimal-A f'p-AMP for fhe besf value of p. In fhis sense, fhe only 
help fhaf fhe opfimal adapfafion policy provides is fo aufomafically find fhe besf value of p for running opfimal-A 
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Fig. 8. Comparison of the phase transition of optimal-A ^i-AMP and optimal-(p, A) £p-AMP under the minimax framework. The phase 
transition exhibited for optimal-(p. A) ^p-AMP is the value of e at which the number of stable fixed points of optimal-(p,A) .^p-AMP changes 
from one to more than one for at least some prior px £ J'e- 


£p-AMPp^ Figure [s] compares the phase transition of optimal-(p, A) f'p-AMP with that of optimal-A f'l-AMP. As 
we expected from Theorem the improvement is minor. 

The results we have presented so far regarding the highest fixed point of £p-AMP are disappointing. It seems 
that if we do not initialize the algorithm properly (and in practice in most cases we will not he able to do so), 
then the performance of the algorithm is at best slightly better than £i-AMP. However, simulation results presented 
elsewhere have shown that iterative algorithms that aim to solve LPLS usually outperform LASSO. Such simulation 
results are not in contradiction with the result we present in this paper. In contrary, they can be explained with the 
framework we developed in our paper. Let the distribution of X be denoted by A ~ (1 — e)Ao -|- eG, where Aq 
denotes a point mass at zero and G denotes the distribution of the nonzero elements. According to Proposition 
the phase transition curve of the optimal-A ("i-AMP is independent of G and only depends on e. This is not true 
for the phase transition of ^p-AMP (the one derived based on the highest fixed point of p(cr^)). In fact, the 
results in Theorem are obtained under the least favorable distribution which is a certain choice of G that leads 
to the lowest phase transition of f'p-AMP possible. For other distributions, optimal-A ^p-AMP can provide a higher 
phase transition. Figure compares the phase transition (based on the highest fixed poinf) of opfimal-A ("p-AMP 
wifh thaf of optimal-A f'l-AMP when G = A(0,1). As is clear from this figure, such disfribufions usually favor 
£p-AMP buf not the ^i-AMP algorithm. Hence, we see that here p = 0.75 has much higher phase transition than 
optimal-A £i-AMP. 

It is important to note that for different distributions, different values of p provide the best phase transition. 
However, if we employ optimal-(p, A) Ip-XMP, it will find fhe optimal value of p automatically. Hence, even 
though the continuation strategy does not provide much improvement in the minimax setting, it can in fact offer a 
huge boost in the performance for practical applications. 


IV. Our contributions in noisy setting 


A. Roadmap 

In this section, we assume that ct.^ 


> 0. This implies that the reconstruction error of £p-AMP is greater than 


zero for all £p-AMPs. We start with analyzing the performance of optimal-A £p-AMP. This corresponds to the 
analysis of the fixed points of p((T^). Generally p((T^) may have more than one stable fixed point. Similar 
to the last section, we study two of the fixed points of this function: (i) The lowest fixed point that corresponds to 
the performance of the algorithm under the best initialization, and (ii) the highest fixed poinf fhaf corresponds to 
the performance of the algorithm under the worst initializations in Sections IV-B| and IV-C respectively. We have 


'“Note that in this paper we are only interested in one performance measure of ^p-AMP algorithms and that is the reconstruction error. 
Adaptation policy may improve the convergence rate of the algorithm. 
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6 

Fig. 9. Phase transition curve for px = (1 — e)Ao + eG, where G is the PDF of the standard normal distribution, p € 
{0, 0.25, 0.5,0.75,0.9,1} are considered in this simulation. These phase transition curves shall be compared with those of Figure]^ 


empirically observed that under the initialization that we use, i.e., = 0, the algorithm converges to the highest 

fixed point. 


B. Analysis of the lowest fixed point 

In this section we study the lowest fixed poinf of opfimally funed £p-AMP. We use fhe nofafion for fhe lowesf 
fixed poinf of ^'A,,p(cr^). Our firsf resulf is concerned wifh fhe performance of fhe algorifhm for small amounf of 
noise. 


Theorem 6 , If e < 6, then there exists such that for every 
Furthermore, 


< fjg, cr| is a continuous function of a. 


2 

w 


(7o 


lim 
<^1^0 at 


1 


1-f 


The proof is presented in Section VI-J It is instructive to compare this result with the corresponding result for 
the optimal-A f'l-AMP. 


Theorem 7. If Mi[e) < 6, then the fixed point of optimal-X ii-AMP is unique and satisfies 

r ^ 

c 72 “ 1 - Miie)/6' 


This result can be derived from the results of 124|. But for the sake of completeness and since we are using a 


different thresholding policy, we present the proof in Section VI-K 


Remark 3. In Section VI-F we show that Mi(e) = infT-(l — e)E(ryi(Z; T))2-|-e(l+r^) > e. Hence the performance 
of the lowest fixed point of optimal-X Ip-AMP is better than that of optimal-X ii-AMP in the limit a"^ —)• 0. The 
continuity of aj as a function of a'^ implies that this comparison is still valid, for small values of a'^. 


What happens as we keep increasing a^7 Figure 10 that is based on our numerical calculations, answers this 
question. It compares aj as a function of cr^ for several different values of p. Two interesting phenomena can be 
observed in this figure: 

(i) Low-noise phenomenon: For small values of a^, fhe lowesf fixed poinf of ^q-AMP oufperforms fhe lowesf 
fixed point of all the other ^p-AMP algorithms. Furthermore smaller values of p seem to have advantage over 
the larger values of p. Note that Theorem does not explain this observation. According to this theorem all 
values of p = 0 seem to perform similarly. We will present a refinement of Theorem in Theorems and 
that is capable of explaining this phenomenon. 
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(ii) High-noise phenomenon: For large values of cr^, optimally tuned ^i-AMP outperforms even the lowest fixed 
point of £p-AMP for every p < 1. As we mentioned before we will connect the lowest fixed poinf of ip with 
the global minimizer of LPLS. This means that LASSO will outperform the global minimizer of ^o-regularized 
least squares for large values of noise. This is in contradiction with the first folklore we mentioned in the 
introduction. Proposition will prove this observation. 



Fig. to. The curve of af as a function of cr^ for p G {0,0.3, 0.6, 0.9,1}. Note that (i) for p = 0, erf is a discontinuous function of cr^. 
(ii) For small values of p = 0 provides the smallest (t|, while for large values of p = 1 exhibits the best performance. Here is the 
set-up for this simulation. <5 and e are set to 0.1 and 0.01 respectively. The non-zero elements of Xo are iid ±1 with probability 0.5. 


Below we justify both the low-noise and high-noise observations. The next two propositions are concerned 
with low noise phenomenon. According to Theorem We know that erj/a^ —)■ ^. Hence, in order to see 
the discrepancy between different values of p, we have to explore how cr| — behaves for small values of 
fj^. Let X ~ (1 — e)Ao -|- eG. Let U denote a random variable with distribution G. We also use the notation 
= f f{u)dG{u), and Pg{U G A) = E(l(f7 G A)), where 1 denotes the indicator function. 

Theorem 8 . Suppose Pg{\U\ > p) = 1 with p being a fixed positive number and Edf/p < oo, then for 0 < p < 1 
and e < 6, 

_ ec^-VEIc/pp- 252 -p 

. ™o ai-^^{\og (4 - 4p)2-P(5 - 6)3-t • 

The proof of this result can be found in Section |VI-M| Before we interpret this result, let us discuss the result 
for p = 0 as well. Note that Theorem does not cover p = 0 case. 

Theorem 9. Suppose 'Kg\U\’^ < 00 and PGi\U\ > p) = 1, where p = sup^{p : P{\U\ > u) = 1} > 0, then for 
p = 0 and e < 6, 

aj = -h o{f{pa-^)), 



where p is any constant that is smaller than ^ 

The proof of this theorem is presented in Section [VI-N| We now discuss how these theorems explain the low-noise 
phenomenon in Figure Suppose that we ignore all the logarithmic terms and study the second dominant term in 
the expressions of cr^ that we derived in Theorem There are two facts we should emphasize here: (i) The second 
dominant term is proportional to and is hence smaller for smaller values of p. (ii) The second dominant 

term is positive. If we combine these two facts, we conclude that if pi < p 2 , for small enough the lowest fixed 
point of optimally tuned ipi -AMP outperforms optimally tuned ^P2 -AMP, which confirms our observafion in Figure 
[T0| More inferesfingly, according fo Theorem for the case p = 0: 

al + o{(f){pa-^)), 


a! 


6-e 
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Here, the second dominant term for p = 0 decays exponentially faster than the polynomial rate for p > 0. Hence 
£o-AMP will outperform £p-AMP for p > 0 in low noise regime, which is again consistent with Figure [T^ Another 
interesting feature of this theorem is its implications for the values of p that are less than 1, hut close to it. Figure 
1^ shows that their performance is in fact close to that of LASSO. If we look at the first dominant term in Theorem 
^ even p = 0.99 may seem to outperform LASSO hy a large margin. However, note that the order of second 
dominant term for p = 0.99 is pretty close to the order of the first dominant term. Hence, any judgement based 
on the first dominant term in such cases is inaccurate and misleading. This shows the importance of the second 
dominant term in these cases. 

So far, we have analyzed the lowest fixed point of and have seen that p < 1 may lead to major 

improvements over optimal-A £i-AMP, if the noise level is not large. Our next goal is to prove the “high-noise 
phenomenon”, i.e., the fact that for large values of noise, optimally tuned £i-AMP outperforms optimally tuned 
£p-AMP for p < 1. 

Proposition 3. Suppose A ~ (1 — e)Ao -|- eA^ where p is a non-zero constant. For any 0 < p < 1, there exists a 
threshold such that optinial-X ^i-AMP outperforms the lowest fixed point of optinial-X ip-AMP for all 


Proof of this result is presented in Section VI-L This proposition implies that even if we had access to the 
best initialization for the optimal-A £p-AMP, we should still use optimal-A fi-AMP when the measurement noise 
is large. Note that even though this theorem is concerned with very large values of the measurement noise, as is 
clear from Figure 10 even for not so large noise levels, £i-AMP outperforms £p-AMP. 


C. Analysis of the highest fixed point of optimally tuned ip-AMP 

So far we have analyzed the lowest fixed point of optimally tuned £p-AMP. In this section we study its highest 
fixed point in the presence of noise. 


Theorem 10. Let ah denote the highest fixed point of the optimal-X ip-AMP. If Mp{e) < 5, then 


at 


< 


(tI 1 _ 

d 


Furthermore, there exists a distribution px G tmd a noise variance a^ for which 


at 


> 


1 


al'- 


(18) 


(19) 


This theorem is proved in Section 


VI-0 We again emphasize that our numerical calculations show that Mp{e) = 
we have proved that Mi(e) = Mjfe). Figure |Tl] compares Mp{e) for 


Mp(e). Also, in the proof of Lemma 
different values of p. For most p < 1, Mp{e) is either larger than Mi(e) or in some cases slightly lower. Hence, 
as far as the highest fixed point of the ip-AMP algorithm on the least favorable signals is concerned, optimal-A 
£p-AMP can offer slight improvements (if any at all) over ("i-AMP. 

Again we would like to emphasize that the bound — A is achieved for very specific distributions. If the 


1 - 


distribution of X is different from those, optimally tuned ^p-AMP can achieve major improvement over optimal-A 
£i-AMP. An interesting question that is left for future research is which distributions benefit LPLS more. 

As is clear from our discussion, optimal-A £p-AMP can outperform ^i-AMP for small values of noise and if 
it reaches its lowest fixed point. Also, since in many cases £p-AMP has other fixed points, it requires a good 
initialization to reach its lowest fixed point. Our next goal is to show whether an optimal adaptation policy can 
resolve the issue of finding a good initialization. As we showed in the last section in the noiseless setting, it does 
not offer much improvement. However, when the noise is small, this algorithm outperforms optimal-A ^i-AMP by 
a large margin. The following theorem confirms this claim. 


Theorem 11. Let ah denote the highest fixed point of the optimal-{p, X) ip-AMP. ^ info<p<i Mp(e) < 6, then 


a. 


lim ^ = 
0 - 2—>-0 ai, 


1-f 
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Fig. 11. Mp{e) as a function of e for p € {0, 0.3, 0.6, 0.9,1}. Lower values of Mp{e) lead to better upper bounds for 



Fig. 12. Comparison of in optimal-A £i-AMP and optimal-(p. A) £p-AMP. <5 and e are set to 0.1 and 0.01 respectively. The non-zero 
elements of Xo are iid ±1 with probability 0.5. 


The proof of this theorem is presented in Section VI-P Note that there is a major difference between this theorem 
and Theorem This result is about the highest fixed point, while Theorem evaluates the lowest fixed poinf. Nofe 
fhaf according to this theorem, if the sparsity level of the signal is below the phase transition of optimal-(p, A) 
£p-AMP, then optimal-(p, A) £p-AMP offers much better noise sensitivity than that of optimal-A £i-AMP (for small 
values of noise). Note that according to Proposition we expect the noise sensitivity of optimal-(p, A) ^p-AMP 
to be the same as the noise sensitivity of optimal-A ^i-AMP for large values of noise. This phenomenon can be 
observed in Figure [T^ As is clear in this figure, for small values of fhe noise, p-confinuafion leads to substantially 
better results than the optimal-A ^i-AMP. 


V. Relation with .(p-NORM minimization 

Replica method is a non-rigorous method invented in statistical physics to study the behavior of large magnetic and 


disordered systems. This method has found many applications in science and engineering 134|-|37|. In particular. 


| [T5| has used this method to analyze the accuracy of x( 7 ,p). Here we briefly explain fhe results derived in | [l5| 
and compare them with the results of our paper. Under the replica symmetry assumptions (summarized in Section 
oo, (xj(7,p), xo,j) converges in distribution to the random vector {r]p{X + aeffZ^jp), X) 
N{0, 1) are independent, and a^ff satisfies fhe following fixed poinf equafion: 


IV of 11^), as N 

'px and Z 


where X 


= (^l + + (TeffZ; 7p) - Xf 


( 20 ) 
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where 7 p can also be calculated in terms of 7 and cr^ff, but its particular form is not of interest in this paper. 
Note that the fixed point of £p-AMP satisfies a fixed poinf equation fhaf is fhe same as ( |20l ) (modulo fhe fhreshold 
paramefer). If we pick 7 opfimally in ( |20| (fo make a^ff or fhe mean square error of fhe reconsfrucfed vecfor as 
small as possible), then the two fixed point equations that are derived from £p-AMP and the Replica method will 
be exactly the same. This exact correspondence can transform all the results about lowest fixed points we derived 
for the optimal-A .(p-AMP to new results for the solution of ([T]). For the sake of brevity, we do not repeat all the 
results here. We qualitatively explain the implications of two of our results: 


1 ) 


2 ) 


If J > e, then ([T]l recovers the exact solution in the noiseless setting for any 0 < p < 1. This can be derived 
by combining Theorem with the result of the Replica method described above. Note among all the fixed 
points of dip] ), the lowest fixed point corresponds to the minimum free energy 1341 and hence characterizes 
the asymptotic performance of the global minimizer of ([T]l. 

When the noise level, cr^, is high, LASSO outperforms LPLS for every p < 1. This result can be derived by 
combining the results of Proposition and the Replica method result. 


VI. Proofs of the main results 

A. Properties of rip{u, X) 

In the proofs of our main results, we employ several properties of the proximal functions r]p{u] X). This section 
is devoted to the derivation of these properties. Note that since tjq (tt; A) and r/i (u; A) have very simple forms p] 
in some of the results mentioned below these two cases are omitted. 


Our first result is concerned with the scale invariance property of rjp {u; A). This result will be used extensively 
in the rest of the paper. 


Lemma 3. r]p{u,X) has the following scale invariance properties for 0 ^ p ^ 

(i) r]p{-u] X) =-r]p{u] X). 

(ii) r]p{au; Aa^“^) = ar]p{u] A), for every a > 0. 


Proof: First, we prove that r]p{—u-, A) = —pp{u] A). According to the definition of rjp, we have 
r]p{—u;X) = argmin (—u — + A|x|^ = argmin (u — (—x))^ + A|— 

X X 

= — argmin (tt — x)^ + A|x|^ = —Pp(u; A). 

X 

To prove the second part of this lemma, note that it is trivially true when a = 0. For any a > 0, we have 
r]p{au] Xa‘^~^) = argmin (au — x)^ + Aa^“^|x|^ 


of x\^ , 

= arg mm a [u -+ Aa 

X \ aJ 


= a arg min (u — x)^ + A|x|^ = ar/p(n; A). 


The next lemma is an auxiliary result that will be used later to derive the main properties of Pp(u; A). 


Lemma 4. For 0 < p < 1, if \rjp{u-, X)\ > 0, then it satisfies 

\Vp{u-,X)\ > C, 

where C* — Furthermore, r]p{u] A) = 0 for every u satisfying 

\u\ < g{C), 

'*77o(m; A) = wI(|m| > and A) = (|u| — A)sign(M)I(|M| > A). 
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where g{C) — C + 


-1 


Proof: According to Lemma part (i), we only consider the case > 0. If r]p{u] A) > 0, since it minimizes 

1 


it must satisfy 
which can he written as 


Cpix,u) = ^{u - xf + A|x|^, 


r]p{u;X) + XprjP A) = u, 


g{r]p{u-,X)) = u. 


( 21 ) 


It is straightforward to check the following facts about g{C)'- (i) 9(C) has a global minimum at C* = y xp{i-p) 

(ii) lim5r(C) = oo. (iii) lim p(C) = oo. (iv) g{() is a decreasing function below C* and an increasing function 

C->0 C^oo 

above C*- 

According to these properties, three different cases happen for p(C) = u\ 

(i) If rt < g{C*)’ then p(C) = u does not have any solution. 

(ii) If rt = 9(C*)> then g{(f) = u has only one solution at C*- 

(iii) If ti > g{C*), then p(C) = u has two solutions; one below C* and one above C*- Among these two solutions 

the value of x that minimizes ^p{x,u) is x > C*- 

This completes the proof of the first part of the lemma. We now prove that for every u < g{C*), gp{u; A) = 0. 

This is due to the fact that the derivative of ^p{x,u) with respect to x will be always positive for every x > 0. 

Hence the minimum must happen at zero. Note that ^p{x, u) is a continuous function of x. 


Lemma 5. For 0 ^ p ^ 1, there exists a threshold Xp such that V|u| < Xp, gp{u]X) = 0 and |? 7 p(rt;A)| > 0 
V|ir| > Xp. Furthermore, we have 


Xp — Cp A ^ ^ 


where Cp = [2(1 — p)]=’-p +p[2(l — ■ The value of Cp is plotted in Figure 13 for different values of p. 



Fig. 13. The value of Cp as a function of p. Note that the threshold Ap defined in Lemmaj^is in the form of 

Proof: We only consider the case 0 < p < 1 and tt > 0. The proof is straightforward for p = 0 and p = 1, 
due to the explicit form of pp in these two cases. Consider the notations ^p{x, u) and g{C) introduced in the proof 
of Lemma 1^ Note that according to the proof of LemmaQ if ??p(u; A) > 0, then g{gp{u; A)) = u. As the first step, 
we would like to prove that if gp{uo; A) > 0 for uq, then gp{u] A) will be greater than zero for any u > uq. Since 
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w > 0, it is straightforward to see that r]p{u; A) > 0. Hence, rjp^u; A) is either equal to zero or it is the solution of 
g{() = u where C > C* (C* is defined in Lemma [^. Let C > C* denote the solution of g{C,) = u. Our goal is to 
show that ^p(0,u) — ^p{C,u) > 0. Toward this goal, we prove that ^p(0,tt) — is an increasing function of 

u. 

We have 

Cp{0,u) - CpiC,u) = - ^{C - u)^ - XCP 

= Cu - - AC^ = C(C + ak^-1) - - A? 

= lc^ + X{p-l)C^. (22) 


By taking the derivative of this function with respect to C, it is straightforward to see that ^p(0, u) — Cp{C^ u) is an 
increasing function of C for ^ > C*- If we prove that C is an increasing function of u, then we can conclude that 
^p(0,n) — ^p{C,u) is an increasing function of u. Note that g{C) = u. By taking the derivative of both sides with 
respect to u, we obtain 




= 1 . 


Again, since g'{C) > 0 for ^ > C*, we conclude that ^ is an increasing function of u. Hence we conclude that 
^p(0,u) — ^p{C,u) is an increasing function of u. If r/p(uo; A) > 0, we know that ^p(0,Mo) — ?p(C)'“o) > 0- Since 
^p(0, u) — ^p{C, u) is an increasing function of u, we have for every u > uq, ^p(0, u) — ^p{C, u) > 0, which implies 
that gp{u; A) = C > 0. 

So far we have been able to prove that there exists an intervaH— Ap, Ap] such that if |u| > Ap, gp{u] A) > 0 and 
for every |rt| < Ap, gp{u; A) = 0. Note that according to Lemma[4 Ap > g{C*)- Our next goal is to derive the exact 
form of Ap. For notational simplicity, define a = A^ and Cp as 


Cp = sup {u : gp{u\ 1) = 0} . 


(23) 


We have 

r/p(n; A) = r/p(u; • 1) = agp (^; l) > 

where the second equality is due to Lemma part (ii). Hence, gp{u; X) = 0 if and only if gp (^;l) = 0, i.e., 
r/p (u; A) = 0 if and only if u/a < Cp. Therefore, we have Ap = CpU = CpX^. Finally, we aim to obtain the 
explicit form of Cp. Denote the larger solution of x = n by x*. Then note that gp{u] 1) = 0 implies 


f < \(x’ - «y- + (x’f = ^ + f- rc-(.T- +p(.T*r ‘) + (xT, 


1 1 

which yields that x* < [2(1 —p)] ■ Since x* is an increasing function of u, we know u < [2(1 —p)] +p[2(l — 

p—1 1 p—1 

p)] 2 -p. On the other hand, it is straightforward to show that gp{u-, 1) = 0 when u = [2(1 —p)]^-” +p[2(l —p)]^-^. 
Combining with the definition of Cp in (|2^ gives us its analytical formula. 


Lemma 6. For i) < p < 1, If \gp{u-, A)| >0, then \gp{u] A)| > [2(1 — p)] ^-pX^-p 

Proof: According to Lemma and Lemma Impart (ii), when gp{u; A) > 0, we know gp{u-, A) > gp{cpX^-p ; A). 
Furthermore, from the proof of Lemma it is straightforward to confirm the following equation, 

CpA^ = Vp {cpX^p ; A) + Xp{g^{cpX ^; A))P“\ (24) 

1 .. 1 

where the equation CpX^-p = x + Xpx‘P~^ has two roots and x = g^{cpX^-p ; A) is the larger one. Dividing both 
sides of the above equation by A^ gives, 

Cp = Vp [cp] 1) + p{Vp {cp; 1))^"^ (25) 

According to the explicit form of Cp in Lemma m we can obtain gp{cp] 1) = [2(1 — p)]^. ■ 
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So far we have studied some of the main properties of rip{u] A). In this paper, we will also work with the derivatives 
of r]p{u;X). Note that the derivative of this function with respect to u exists for every u except at u = CpX^. 
Furthermore, its derivative with respect to A exists everywhere except for A = {u/cp)‘^~P. For notational simplicity, 
we use the following notations for the partial derivatives: 


di7]p{u]X) = 
d2r]p{u]X) = 


dr]p{u; A) 
du 

dr]p{u-, A) 
dX 


dfvpiu; A) = 
5|r/p(u; A) = 


d‘^r]p{u-,X) 


dv? ’ 
d^r]p{u-,X) 


dX^ 


Whenever we use these notations, we refer to the derivative of the function for the values of u and A at which 

\T]p{u;X)\ > 0. 


Lemma 7. If rjp{u; A) > 0, then for 0 < p < 1 and A > 0, r]p{u] A) satisfies 

(i) pp{u; A) < u. 

(ii) 1 < sup^^(„.;^)>o<9ir?p(u;A) < oo. 

(iii) dfripiu; A) < 0. 

Furthermore, since Pp{u] A) is an odd function, dirip{u; A) and A) are even and odd functions respectively. 

Therefore, for r]p{u] A) < 0, we have (i) r]p{u] A) > u, (ii) 1 < sup^^(-^._x)<o dir]p{u] A) < oo, (iii) dfr]p{u] A) > 0. 


Proof: In this proof, we only consider the case r]p(u\ X) > 0. Note that rip{u] X) satisfies 

r/p(n; A) - n + Xpri^~^{u-, A) = 0 

Since r]p{u] A) > 0, we have r]p{u] A) < u. Taking the derivative with respect to u from both sides of the equation 
above, we obtain 

5ir/p(u; A) - 1 + Xp{p - 1 ) 77 ^“^(u; X)dir]p{u] X) = 0. (26) 

Therefore, the derivative of pp(u]X) is 

1 

1 + Xp(p - A) 

Furthermore, based on Lemma we have 

0 > Xp{p - l)riP-‘^{u;X) > Xp{p - l){[2(l - p)]^v =p(p- ^ “I ^ 

Note the inequality above holds for every possible u and A such that r/p(u; A) > 0, which hence shows (ii). We 
now prove the third part of the lemma. By taking another derivative from ( [26l ) with respect to u, we obtain 

dfvpiu; A) + Xp{p -l){p- 2)r]P-^{u-, X){dir]p{u; A))^ + Xp{p - X)dlrip{u; A) = 0. 

Hence 

-Xpjp - l)(p - 2)r^~^{u] X)(dir]p(u; A))^ 

1 + Xp{p - 

Again by employing Lemma we can conclude that the second derivative is negative. We may also claim that 

sup |c)i77p(m; A)| < cx). 

U 

□ 

The next lemma is concerned with the properties of A) as a function of A. 

Lemma 8. If \rip{u] A)| > 0 and 0 < p < 1, we have 

d 2 rip{u] A) = -plrjpiu] A)r 7 p(u; A) • sign(u). 

In particular, d 2 r]p(u] A) < 0 when u > 0 and d 2 'i]p{u] X) > 0 if u < 0. 


dfrjpiu-, A) 
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Proof. We prove the result for the case of r]p{u; A) > 0. The other case can he proved in exactly the same way. 


Note that since r]p{u; A) > 0, it satisfies 

r]p{u;X) - u +Xpr]P~^{u;X) = 0. (27) 

By taking the derivative of ( [27] ) with respect to u we obtain 

dirjp{u] A) - 1 + Xp{p - X)dir]p{u, A) = 0. (28) 

By taking the derivative of ( |27] ) with respect to A we obtain 

d2r]piu-,X) + priP~^{u;X) + Xp{p - l)r]P~‘^{u;X)d2r]piu,X) = 0. (29) 

The final result can be obtained by combining (|28|) and (|2^. □ 


Below we summarize two straightforward corollaries of the above results. These two corollaries enable us to 
compare rjp with % and rji. First note that according to Lemma the threshold at which r]p{u-,X) switches from 
zero to a positive number is different for different values of p. This makes the comparison of these proximal 

functions complicated. However, according to Lemma if we set the parameter Xp = with A being a 

fixed constant, then for every 0 < p < 1, we have r/p(u; A) = 0 for |u| < A and r]p{u-, A) 0 for |n| > A. Based 
on this new parametrization, we would like to compare rjp with tjq and rji. 

Corollary 4. Define ■ Then 

r]p{u] Xp) > pi{u-, Xi), Vn>A, 0<p<l. 

Proof: Note that r/i(A;Ai) = 0 and pp{X; Xp) > 0 for every 0 < p < 1. The derivative of the soft thresholding 
function is dipi{u-, Xi) = 1 for u > A. According to Lemma 7, the derivative of pp{u]Xp) is diPp{u]Xp) > 1 for 
u > X. Therefore, we have pi{u] Ai) < r]p{u] Xp) when u > X and 0 < p < 1. It is straightforward to check that 
the result also holds for p = 0, i.e., Ai) < rio{u; Xq). □ 

/ l \ 2-p 

Corollary 5. Let Ap = ( ^ j .We have 

r]p{u]Xp)<po{u-,Xo), Vu>A,0<p<l. 

Proof: Since r]\{u,Xi) admits an explicit form, it is straightforward to verify the result. For 0 < p < 1, it is a 
direction result of Lemma part (i). □ 

Another type of result that we will use in this paper is about the behavior of t/p(m; A) and its derivative for large 
values of u. The rest of this section is devoted to such results. 

Lemma 9. Let A > 0 and 0 < p < 1 fie two fixed numbers. Then for large value of u, we have 

r]p{u; X) = u — Xp sign(rt)|M|^“^ + o{\uf~^). 

Proof: For simplicity, we only consider the case u > 0. First note that Corollary shows 

r]p{u] Xp) > r]i{u] Ai) —)> oo, u —)■ oo. 

Moreove, we know for large enough u, r]p{u-, X) satisfies 


r]p{u-, X) -u + Xpp^ X) = 0. 

(30) 

Define 

Vp{u] X) = u- pp{u\ A). 

If we plug (|3T]) in (30) then we have 

Vp{u;X) = XprjP~^{u;X). 

(31) 
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Finally, 


lim / ii„, =0. 

u—^oo yP ^ \ u^oo yP ^ I 


The last equality is due to the fact that lim^ 


2^-l = -Aplim„,„S2yi^=0. 


□ 


Lemma 10. Let 0 < p < 1 and X > 0 be two fixed numbers. For large values of u we have 

dir]p{u-, A) = 1 + Xp{l - p)\uf~^ + o(|u|^“^) 

Proof: We only consider n > 0 for simplicity. Taking the derivative of ( [^ with respect to u leads to 

dirip{u] A) = 

Hence we have 


1 


1 + Xp{p - l)r?^ ^(rt; A) 


lim 

U^OO 


dir]p{u; A) - 1 - Ap(l - p) 


— r)! yP 2 


vP 


—2 


— lim ^-Ap(l-p)ry^ ^(n; A) + A^p^(l - 

vP~‘^{l + Xp{p — l)rfp~'^{u] A)) 

To obtain the last equality, we have employed the following equalities that are proved in the last lemma: 

lim Pp{u; A) = oo; lim ^ = 1. 

u^oo u^oo y 


(32) 


(33) 


□ 


B. Smoothness of state evolution function 'I'a,p(o'^) 

In the paper there are many instances at which we require the derivatives of 'I'a.,p(< 7^) or (o'^). In this 

section, we prove all the smoothness properties that are require throughout the paper. For simplicity we define the 
following notations: 

Hp{a,X) ^ E[r?p(X + aZ;A)-X]2, 

A*(cr) = argminFZp((T, A). 


Note that 

^A.,p(o-^) = ^iTp(cT,A*(o-)). 

Lemma 11. If ao > 0 and Aq > 0, then exists at do and Aq and is equal to 

= -E[{7]p{X + dZ; A) - Xf{Z^ - 1)]. 

(cro,Ao) ^ 

Proof: Let F denote the CDF of X. Then, 


dHp{a,X) 

da 


/ OO POO 

/ {pp{x + az; A) — x)‘^4>{z)dzdF{x) 

■OO J —OO 


' —OO J —OO 

POO POO 


a 


{r]p{z] A) — x)‘^(f){{z — x)/a)dzdF{x). 


' —OO J —OO 


Hence our first goal is to show that {r]p{z; X) — x)‘^(j){{z — x)/a)dzdF{x) is differentiable and that the 

derivative may move inside the integral. For the moment we assume that d > do and we calculate 


/ OO POO 

/ (^p(^; -^) 

-oo 2-00 




d - do 
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From mean value theorem we conclude that 

4){{z - x)/a) - 4){{z - x)/ao) \z - x\'^ 


CJ"’ 


(p{{z - x)/a), 


a -ao 

where d G [(Tq, cr]. It is straightforward to confirm that 

.24>{{z - x)/a) - (t){{z - x)/ao) 


/ oo roo 

/ X) - xf 

-OO J —OO 


' —OO •J —OO 

foo noo 


a - (Jo 
2 


dzdF{x) 


/ OO p OO I I ' 2 , 

/ {r]p{z-, \) - xf '^ , 3 ^ (l){{z - x)/d)dzdF{x) 

• OO J —00 ^ 


f —00 J —00 
roo POO 


/ oo POO \^\'Z 

/ (r/p(x + z; A) — x)^— 7^(f>{z/d')dzdF{x) 

-ocJ-00 0 -'^ 


' —OO J —OO 
fOO poo 


< 


(2|x| + \z\)‘^-^4>{z/d)dzdF{x) < 00 . 




J —OO J —OO 

Hence, the condition of dominated convergence theorem holds and we can switch the integrals and the derivative 
to obtain 

fOO poo 


dHp{a,X) 

da 


a 

1 

a 

1 

a 

1 , 


POO POO 

7 / {'np{z;X)-xf(j){{z-x)/a)f{x)dzdx + 

J — OO J — 00 


' —00 J —00 
r *00 poo 


idpiz; X) - xY(j){{z - x)/a)f{x) 


{z — x)" 


dzdx 


' —OO J —OO 

poo poo 




{r]p{x + az] A) — xY{z^ — l)(j){z)f{x)dzdx 


— OO J —OO 


(34) 


= ^E[{r]p{X + aZ;X)-X)\Z^-l)]. 

Lemma 12. ^ continuous function of (A, a) for any A > 0 and a > 0. 

Proof: Define J(x, a, X) = E[{r]p{x + aZ] A) — — 1)]. Lemma 0 proves that 

2f^ = iElJ(X,..A)l. 

We first show that J{x, a, A) is continuous for any A > 0, cr > 0, given any fixed x. We start hy rewriting J(x, a, A): 
J{x, a, A) = E[r]p{x + aZ; X){Z^ — 1)] —2xE[r]p{x + aZ] X)(Z‘^ — 1)] . 


=J{x,a,\) 


= J{x,a,\) 


Regarding J{x,a,X) we have 


Jix,a,X) = A^E[? 72 (Ai^(x + c 7 Z);l)(z 2 -l)] 

= X^-p / r]p{Xp-^ {x + az)-,l){z‘^ — l)4>{z)dz. 


X^-p 


a 


vliz-A) 


X 2 -P 2 : — x\ 2 


A^-pe 2 <^ 


O’"' 


/ OO 

dpiz-,l) 

-OO 


a 


z^exp 


- 1 


X^-p z — X 


a 


dz 


’ —OO 

—2xA^e2?2" 


X^-p 


z^ + 


xA 2 - 


—2a‘^ a 


a‘- 


riliz; l)2:exp 


A 2 -P 


(x^ — a‘^)X^-p 


a'^ 


/ OO 

^p(^;l)exp 

-OO 


- 2 ct 2 

2 

A^-P 


+ 


2 

1 

xA 2 - 




z]dz + 
z]dz + 


- 2 a 2 


2 xA^-p 

• z H- 5 — • z ]dz. 

a^ 


(35) 
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We have used Lemma(ii) to derive (a). Denote .^i = = ^^^ 2 ” ■ Then = c(^i,^ 2 )exp(^iz^ + 

^ 2 z) defines a two-parameter exponential family with natural parameter space {(^,^ 2 ) | < 0,^2 £ IK)}> where 

c(^i, ^ 2 ) is the normalization constant. Hence according to Theorem 2.7.1 in |j^, Vpi^'^ l) 2 ;^exp(^iz^-|-^ 22 ;)d 2 ; 

is continuous with respect to (^ 1 ,^ 2 ) in the natural parameter space. It further implies that 'Hpiz'-, l) 2 ;^exp( ^ • 


^2 _|_ p . 2 ; \(lz is continuous for A > 0, cr > 0. Therefore, we can conclude the first term on the right hand side 


of (^1 is continuous. Similar arguments work for the second and third terms. Showing the continuity of J{x, a, A) 
is also similar and is skipped. Now consider any given cjo > 0, Aq > 0. It is straightforward to verify the existance 
of Cl, C 2 > 0 such that 


|J(x,f7,A)| <E[(2\x + aZ\‘^ + 2x‘^){Z^-i-l)] < + 6x2)(z2 + 1)] = cix^ + ca, (36) 


Hence we can apply dominated convergence theorem to obtain 


lim E[J(X,f7 ,A)] =E lim J{X,a,X) = E[J(X, ao, Aq)]. 

A^Ao A—)-Ao 

(7—>(To 17—>(To 


Lemma 13. ^ continuous function of (A, a) for any A > 0 and a > 0. 


The proof is similar to the proof of Lemma 12 and is hence skipped here. 

Lemma 14. For a given oq > 0, suppose the optimal thresholding value A*((To) satisfies the condition: 

inf Hp{ao,X) < inf Hp{ao,X), 

A>0 ^ |A-A,(<7o)|>c 

for any c > 0. Then A* (it) is continuous at cr = ao- 
Proof: According to Lemma we have 


Furthermore, 

dHp{a, A) 
da 


= ^E\{pp{X + aZ- X) - Xf{Z^ - 1)] 


< ^E[(Z2 + l){2pl{X + aZ] X) + 2X2)] < i]E[(6X2 + Aa‘^Z‘^){Z‘^ + 1)] 
a ^ a 


(37) 


Note that the upper hound above does not depend on A. This implies that for any given do > 0, there exists a 
neighborhood Br{ao) such that the following holds for any a G Br{ao): 


sup|77p(fT, A) - Hp{ao,X)\ < K{ao) ■ \a - ao\, 

A 

where 76(do) is a constant depending on do. We then have 


Hp{a, A*(d)) - iTp(do, A*(do)) 

= [i7p(d, A*(d)) - i7p(d, A*(do))] + [i7p(d, A*(do)) - iTp(do, A*(do))] 
< sup |i7p(d, A) - iTp(do, A)| < iT(do) • \a - do| 

A 

On the other hand, 

Hp{a, A*(d)) - i7p(do, A*(do)) 

= [i7p(d, A*(d)) - iTp(do, A*(d))] + [i7p(do, A*(d)) - i7p(do, A*(do))] 
> -sup|i7p(d, A) - Hp{ao,X)\ > -K{ao) ■ |d - do| 

A 

Therefore, we obtain 


inf HJa, A) — inf HJan, A)I < Kiaf) ■ Id — dol 
A>o ^ A>o ^ 


( 38 ) 
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Similarly we can get 


inf i^p(o- 0 ,A)| <ii'(o-o)-|(T-o-o| 
|A-A.((To)|>e |A-A.(cro)|>e 


(39) 


Now for any given e > 0, by the condition we impose, there exists a constant d > 0 such that 

inf if„((To, A) — inf X) > d 

|A-A.(<to)|>£ ^ A>0 

This combined with Equations ([38l) and yields, 


inf Hp{a, A) — inf Hp{a, X) > d — 2K{ao) ■ \(t — an\ > d/2 > 0 

|A—A,(cro)|>£ A >0 

for a G Br{(TQ) with sufficiently small r. It implies that 

I A* (a) - A* ((To) I < e, for a G Br{ao). 

This finishes the proof of the continuity. 


Theorem 12. Suppose for any gq > 0, the global optima A*(cro) is isolated^^ i.e., 

mf iTp(cTo, A) < inf Hp{ao,X) 

A>0 |A—A*((To)|>c 

for any c > 0. Then 'I'a.,p(o'^) is differentiable with respect to a over (0,cx)) with continuous derivative and 


Proof: Consider a given do > 0. Then 

d^'A.,p(o-o) 


_ dHp{a, X^{a)) 
da da 

_ Hpjcr, X^{a)) - Hp{ao, X^jap)) 

CT—>-(To 


da (T—^(tq a — (Tq 

We first assume a > gq. Note that 

Hp{a,X^{a)) - Hp{ao, X*{(^o)) _ [Hp{a,X^{a)) - Hp{a, X^{ao))] + [Hp{a, X^{ao)) - Hp{ao, X^{ao))] 


a - Go 


< 


a - Go 

Hp{G, X^jap)) - Hpiap, X^{go)) 
a- Go 


Hence we have 


Hp{G, X^{g)) - Hp{Go, X^{go)) dHp{Go,X^{Go)) 
lim sup- < 


cr - do 


da 


(40) 


CT-^CTj, 

On the other hand, we have 

Hp{g,X^{g)) - iTp(do, A*(do)) _ [iTp(d, A*(d)) - iTp(do, A*(d))] + [iTp(do, A*(d)) - iTp(do, A*(do))] 


d - do 


> 


d - do 

-Hp(o-, A*(d)) - iTp(do, A*(d)) _ dHp{G,X^{G)) 


d - do 


da 


where a is between a and do- Since we have showed from Lemma 
continuous, we can conclude from the above inequality that 


12 


and 


14 


lim inf A,^(d)) - jTp(do, A^,(do)) ^ ^p(do, A*(do)) 


d - do 


da 


that A*(d) and are both 

(41) 


Inequalities (|40|) and (|4T]) together show that 


lim 


ifp(d, A*(d)) - ifp(do, A*(do)) _ c)iTp(do, A*(do)) 


cr^a, 


+ (T -Go da 

*^This assumption turns out to be very mild. Based on our simulations, Hp{a, A), as a function of A, has quasi-convex shapes. 
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Similarly, we can prove the same equality when cr —)• cTq . Thus we can obtain that 

d^x„p{^o) _ dHp{ao,X^{ao)) 


da da 

Since and A* (it) are both continuous, we know 


is continuous as well. 


Theorem 13. Denote 6 = (A,p). Suppose for any ao > 0, the global optima O^^af) is isolated, i.e., 


inf Hp{ao,X)< inf Hp{ao,X), 

A>0,0<P<1 ^ ||0-6».((To)||>c 


for any c > 0. Then (<^^) is differentiable with respect to a over (0,oo) with continuous derivative and 

d'^x^pA^^) ^ A,(a)) 

da da 

Proof: First note that we can prove 6*{a) is continuous over (0, oo). It follows the same route as the proof of 
Lemma 14 The key observation is that the upper bound on we showed in (^i does not depend on either 


p or A. For the sake of brevity we skip the complete proof. 


The rest of the proof is also very similar to the proof of Theorem 12 Note that the key ingredient in the proof 
of Theorem 12 is the continuity with respect to {a, A). In order to extend that proof to Theorei 

dHp{a,\) continuous with respect to (cr, X,p). Recall from (|34l) that 


we 


should show that 


d<7 


= ^ElippiX + aZ; A) - X)\Z^ - 1)]. 

We can use the same arguments as presented for proving Lemma to calculate 

= ^IE[(r/p(X + aZ; A) - X)\Z^ - 5Z^ + 2 )]. 
Hence, it is straightforward to verify that 


d^Hp{a,X) 

da"^ 


< ^E[(6A:2 + Aa‘^zA{Z^ + 5Z2 + 2)]. 


Note that the upper bound above is independent of both p and A. Thus according to mean value theorem, 
is Lipschitz continuous (with a Lipschitz constant that does not depend on A and p) over (p, A) with respect to 
C7 > 0. If we can further show '&[{pp{X + crZ; A) — X)^(Z^ — 1)] is continuous with respect to (p, A) for any given 
c7 > 0, we are done. For that purpose, we do the analysis in two steps: 

• Firstly, we will show 'E[{pp{X + aZ; A) — Af)^(Z^ — 1)] is continuous with respect to p, for any given A > 0. 

• We then prove K[{r]p{X + crZ; A) — Af)^(Z^ — 1)] is continuous with respect to A uniformly over p. 
Regarding the first step, note that as p —p. 


{pp{X + aZ- A) - XfiZ'^ - 1)1(|A: + (jZ| / CpX-^p) 
{pp{X + aZ- X) - XfiZ'^ - 1)1(|A: + ctZ| ^ CpX^p) 


Also since \{pp{X + aZ] A) — A)^(Z^ — 1)| < 2{\X + crZp + X^)(Z^ + 1), we can apply DCT to conclude it. 
For the second step, recall the definition in the proof of Lemma 

J(x, A,p) = E[{pp{x + crZ; A) - x)^(Z^ - 1)]. 

We then have E[{r]p{X+aZ-, A) —X)^(Z^ —1)] = ExJiX, A,p). If we can show limA^.Ao Ao,p) 

uniformly over p, then by (|3^ we can apply uniform DCT to finish the proof. Hence, what left to prove is J(x, A,p) 
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is uniformly continuous over p with respect to A > 0, for any given x. We rewrite J{x, X,p) as 


Jix,X,p) = 


{r]p{x + az] A) — — l)4>{z)dz 


’ —OO 

roo 


{X^-pr]p{Xp-^ {x + (Jz)] 1) — xf{z‘‘ — l)(p{z)dz 


J —OO 

joo 


' —OO 


s,* 


a 


-(X^-p?]p(z;l) - xf 


zX^-p — x\2 


a 


- 1 


zX^-p — X 


a 


dz 


=K(x,\,p,z) 


Since supo<p<i |r/p(z; 1)| < |z| and 

sup |A^ I < max(l, A), inf |A^ | > min(A, A^'^^), 

0<p<l 0<p<l 


for a given small neighbor Br{Xo), we can easily find an upper bound L{x, z) such that 

sup \K{x, X,p, z)\ < L{x, z) 

0<p<l 

holds for all A G Br{Xo) and L{x, z)dz < oo. Moreover, note that A^ is uniformly continuous at any A > 0, 
thus it is easy to see K{x, A,p, z) is uniformly continuous as well. We can then apply uniform DCT again to show 
J{x,X,p) is uniformly continuous. 


C. Proof of Theorem 

According to Q, we have 

Hence, 


dp,hiPi 'S*p('u, A) + Dp fiiu^ A). 
d'phiu\ X) = S'Ju] A) + DpJu] A), 


where (•)' denotes the derivative with respect to the first argument of the function. Let Ap denote the threshold 
specified in Lemma According fo Q, fhe derivafive of S'p(ri;A) is fhe same as fhe derivafive of pp{u; X) for 
every |u| > Ap. Moreover, from Lemma part (ii) we already know that sup„ \pp{u] X)\ < oo. Hence, our first 


conclusion is the following: 


sup |5p(ri; A)| = sup |T/p(ri; A)| < 00 . 


(42) 


Next we claim that the derivative of Dp/^( m; A) with respect to u is bounded as well. To prove this claim, first note 
that 

^ r,— I _ \ . \ ^ / _ ^ 


Dp^hiu', A) — Pp (Ap! A) 


1 (u- 3 )^ ~ _ ~ 

-e I{s > Xp)ds+ Pp{-Xp]X) 




Therefore, it is straightforward to use the dominated convergence theorem to show that 


e ][(s < —Xp)ds. (43) 


1 


D’p,h{u'A) = dp {\-A) J > Ap)ds + 7yp (Ap; A) J "^1(3 < -Xp)ds. 

Hence, 

\Dp^hiu-,X)\ < 2r/+(Ap;A) 


1 


(t— 


1 


(it —s)-^ ~ 

|it — s|e (j^g = 4 jy+ 


1 




4??+(Ap;A) 

ze dz = — ,_ —. (44) 


\/^h 


Combining ( |4^ and ( [44l ) proves that sup„ 1?)^ ^(it; A)| is bounded. Hence, by the mean value theorem we can 
conclude that pp^hiu; X) is Lipschitz continuous. Under the Lipschitz continuity of i)p /i(it;A), we can employ 
Theorem 1 of to show that: 


||x*+l(iV,/i) -Xo(iV)||i a^. ^ _ ,7. , ^ 

jy — IE (^|7yp^/j(2f + At) X\ j , 


( 45 ) 
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where at^h satisfies the following equation: 

^t+l,h = At) — . 

It is straightforward to employ ( |43| ) and conclude that 

|x*+^(iV, hi) - Xo(A ^)||2 a.s. 


lim lim 

i^oo N^OO 


N 


= lim E \i)p,h.{X + cTt,h^Z-, At) - X| 


The last step is to prove that 

.fim E (\fip,h.iX + cTtMZ-. h) - = E (|r?p(X + atZ- At) - Xf') , 

with at satisfying 

^t+i = + {\VpiX + atZ-, At) - X|2) . 

We use an induction on t to prove ( |4T| ). 

(i) Base of the induction: First note that do = do,h- Hence, we have to prove that 

.fim E (\vp,hAX + cToZ; Aq) - Xp) = E (\7]p{X + aoZ; Aq) - 
According to Lemma we have 

A)| < |5p(rx;A)| + \Dp^h{u;X)\ < \u\ +r/+(Ap;A). 

Define Ao,p — CpA^^^ where Cp is the constant we defined in Lemma We have 

\Vp,h{X + aoZ] Ao)| < |Al + doE’l + ijp (Ao,p; Aq). 


( 46 ) 


(47) 

(48) 


(49) 


Hence, 


\Vp^h{X + aoZ; Aq) - < (|X + aoZj + |X| + r?+(Ao,p; Aq))' 


Since E(|X + doE'l + |X| + r/j]'(Ao,p; Aq))^ < oo, if we can show 

fim fip^h{u;X) = r]p{u;X), 

/i—10+ 

then by the dominated convergence theorem we can conclude ( |4^ . To show (^i, first notice 

1 


1 

e I(s > Xp)ds = 


J \/^h 

Therefore, it is straightforward to confirm, 

1 


\„—u \/‘ 27 rh 


1 _ f°° 

e ^ dz = 


h^+J V^h 


(u-s)^ _ ^ r . . 

e I(s > Xp)ds = 




1 if ti > Xp, 

0 if ti < Xp. 


e 2 


dz. 


Similarly, we can show 


fim / ,_ e ' 2 h 2 ^ f(s < —X„)ds = 

h^o+J ^ 


1 if tt < —A 
0 if ti > —Xr. 


pj 


(50) 


Combining the two equalities above with ( |43] ) proves that fim/i_ 5 .o+L>p,h(tt; A) = Dp{u]X), which in turn 
shows fim/i_ 5 .o+ fjp,h{w, A) = r]p{u\X). This completes the proof. 

(ii) Inductive step: Now we assume that ( |47] ) is true for iteration t and our goal is to show it for iteration f + 1. 
First note that ^ 

(51) 


^t,h — (\hp,h{X + at-i^hZ] Xt-i) — X\^'^ . 

According to the assumption of induction: 

E {\fip,hXX + at-i,hX- Xt-i) - X|2) ^ E (|r?p(X + at-iZ- Xt-i) - Xp) . 
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Hence, cr^ ^ uf as i —)• cx). Moreover, note that 

Vp,hi{^ + ^t) = Sp{X + ^t) + \t) 

rip{X + atZ; Xt) = Sp{X + atZ; Xt) + Dp{X + atZ; Xt) 

Since Sp{u-, A) is a continuous function of u, we have 

lim Sp{X + at,h^Z- Xt) = Sp^X + atZ- A*). 

i—>-oo 

Furthermore, it is not hard to see that the arguments we used to prove Dp^hiu; A) —)• Dp{u] A) in step (i) can 
he applied to show Dp^hiuh'-, X) —?■ Dp{u; A), if ^ u, as h ^ 0+. Therefore, we can obtain 

lim DphX^ + at,h,Z]Xt) = Dp{X + atZ;Xt). 

1^00 

Combining the last two equalities, we have showed that 

lim fjp^hAZ^ + crqh,-^;At) = r]p{X + atZ-,Xt). 
i^oo 

Since at^hi is bounded, we can use similar calculations as in step (i) to bound \rip^h.{X + cjt^hiZ; Xt) — X\‘^. Hence 
dominated convergence theorem can be applied to conclude 

^lim E {\flp,hXX + c7t,h^Z- Xt) - = E (\7jp{X + atZ; Xt) - Xf^ . 


D. Proof of Proposition 

We have already proved in Theorem 12 that T'A,,p(cr^) is a continuous function of cr^ (we have in fact proved 
that it is differentiable). We consider the noiseless setting = 0. The proof for the noisy setting is essentially 
the same. First note that for the case u = 0, we have 'Fo,p(0) = 0. Hence, p(0) = 0. Therefore cj^ = 0 is a 
fixed point of If it is a stable fixed point, it will establish the lemma. We assume that it is an unstable fixed 

poinf. Then fhere exisfs a value of a, called au for which 


(52) 

Furthermore, we will show that for > j[E(Af^) + 1] we have 

(53) 

Since 'Fa.,p(o'^) is continuous, we can combine ( [Sl] ) and ( [5^ and conclude the existence of the stable fixed poinf 
in fhe range [cr^, j(E(X^) + 2)]. Hence, the only step that is left to prove is ( |5^ . Note that from Lemma we 
have \r]p{X + aZ] A) — Af| < |X + aZ\ + |X| < 2|Af| + crlZl. Since E(2|X| + cr|Z|)^ is bounded, we can employ 
the dominated convergence theorem to get 

lim E{T]p{X + aZ- X) - = E lim {r]p{X + aZ] X) - = E(X2). 

A^oo A^oo 


Hence, there exists a value of A^ < oo such that E{'qp{X + oZ; Xu) — X)'^ < E(X^) + 1. Therefore, 




EX2 + 1 

5 


which implies ([5^ and completes the proof. 
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E. Proof of Theorem 

Let X ~ (1 — e)Ao + eG, where G is an arbitrary distribution that does not have any mass at zero. Also, let U 
denote a random variable with distribution G and E[/ be the expectation with respect to U. Define 

^ ^nvp{x + aZ-ra^-P) - X). 


Note that if r = then 

(T^ P ’ 


V'r,p(o-^) = ^A,p(o-^). 


(54) 


We have 


V’t,p(o-^) = - (1 - e)E [riliaZ] ra'^ ^)) + eE{r]p{U + aZ; ra"^ -Uy 


(^) " 
5 L 


(1 - e)E {r,l{Z- r)) + eE{r]p{U/a + Z; r) - U/af 


a 


(1 - e)E {riliZ; t)) + eEu \Ezirjp{U/a + Z; r) - U/a) 


2 2 

^ y(l - e)E {r]p{Z;T)) + ye s^^Ez{r]p{U/a + Z;t) - U/af, 
where Equality (a) is due to Lemma It is straightforward to employ ( [Sd] ) and derive 

^ ^ ^ inf r [nl{Z;T)] + yupEz \ir}p{U/a + Z;t) - U/af 

r>0 T>0 b ^ 0 u V 


Mp{e) 


Hence, if Mp{e) < 6, then the inequality above implies that 4 'a.,p(c’'^) < for any cr > 0, meaning 
does not have any fixed poinf excepf af zero and fhaf fixed poinf is sfable. Now we prove fhe second parf of fhe 
theorem. Suppose that 

Mp{e) > S. 

We would like to show that there exist certain distributions in for which p(cr^) has a non-zero stable fixed 
poinf. Suppose fhaf X has fhe disfribufion (1 — e)Ao + eAi, where A^ denofes a poinf mass af a. Then i 

can be wriffen as 


r2 




= ^^E [riliZ; t)] + ^E [{r^p{l/a + Z; r) - l/af 

- T >0 + ^E {r]p{l/a + Z;t) -1/af 


For nofafional simplicify assume fhat sup inf fiy^E \rip{Z] r)] + |E {rjp(fj, + Z; r) — fif 

ii>o ’■>0 L 

define do = 1 We fhen have 

^ M.p{e) ^ ^ 

2 ^ ^ > -L- 


cr. 


is achieved af /r* and 


(55) 


This, combined wifh ( |54l ), implies fhaf 

4'A..p(f^o) > 

Also, according fo (^1 we know fhaf if (t 2 > f[E(A2) + 1], then 


Hence, by the continuity of '^x,,p{cT^) (proved in Theorem 121 we conclude that 'kA.,p(<7^) has a stable fixed poinf 
af some cr^ > cJq. Therefore, for fhe disfribufion (1 — e)Ao + eAi, 4';^^ p(cr^) has af leasf one non-zero sfable fixed 
poinf. 


^If fi* is infinite, then we can use the same technique, but we should show that zero is an unstable fixed point. 
































34 


F. Proof of Corollary [7] 

Define 

e;{5) ^ inf{e : Mp{e) > J}. 

First note that it is straightforward to show that Mp(l) = 1. Hence, {e : Mp{e) > <5} is not empty. It is clear that 
for e < €^(S) we have Mp(e) < 6. Combining this with Theorem establishes the first part of our result. For the 
second part of the corollary, define 

e*{6) = sup{e : Mp{e) < <5}. 

Since M^O) = 0, {e : M_p{e) < <5} is not empty. Furthermore, if e > e*((5), then M^p{e) > 5. According to 
Theorem]^ there exists a distribution for which the recovery of optimally tuned £p-AMP is not successful. 


G. Proof of Lemma ^ 

For any given r > 0, we have 

+ Z;t)- 
d/i 


= 2E[(r/i(/r + Z; r) - ^)(I(|/r + Z\>t)- 1)] 
= 2/iE(I(|^ + Z| < r)) > 0, 


for any /r > 0. Hence E(T/i(/r + Z; r) — /r)^ is an increasing function of // over [0, oo). This implies that 
Mi(e) = inf sup(l - e)E(r/i(Z;r))^ + eE(? 7 i(/r + Z;r) - 
= inf lim (1 - e)E(77i(Z;r))^ + eE(r/i(/r + Z; r) - 

T fl^OO 

= inf(l - e)E(r/i(Z; r))^ + e(l + T^). 

T 

The last equality is obtained by dominated convergence theorem (the details can be found in the proof of Theorem 
1^. On the other hand, we know 

M^(e) = supinf(l - e)E(7/i(Z;r))^ + eE(r/i(/r + Z; r) - 

> lim inf(l - e)E(r/i(Z;r))^ + eE(77i(/i + Z;r) -/i)^ 

fl^OO T 

inf(l-e)E(7/i(Z;/3))2 + e(l + /32), 


where (a) is a direct implication from the proof of Lemma 21 (by setting X = (1 — e)Ao + eAi). Thus, we 
have showed Mi(e) < M^(e). Moreover, we can easily see Mi(e) > from their definitions. So we can 

conclude Mi(e) = M_i{e). Now we would like to prove that Mi(e) is an increasing function of e. First note that 
(1 — e)E(r/i(Z;r))^ + e(l + r^) is a strictly convex function of r and has a unique global minima, denoted by 
r* > 0. Note that the subgradient of Mi(e) with respect to e is 1 + — E(7yi(Z;r*))^ > 0. Therefore, Mi(e) is 

a strictly increasing continuous function. Hence, e^(<5) = (<5) = = e^((5). 


FI. Proof of Theorem 

1) Main part: For any cr > 0 and any thresholding policy A((t), define t(cj) = Also, let r*(cr) denote 

the optimal value of T{a) given by r*(cr) = ^^ 2 -} ■ In the rest of the proof, we write pp {X + aZ] \{a)) as 
Pp (A + cjZ; r((T)iT^“P). This will enable us to employ the scale invariance properties of the proximal function, 
proved in Lemmamore efficiently. Since it is easier to work with t((t), we use the notation 

V'r,p(<7^) = ^E{pp{X + fiZ; T(c7)fj^“t’) - xf. (56) 

Clearly, we have 

V'r,p(o-^) = 
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Note that cj^ = 0 is actually a fixed point of Furthermore, it is straightforward to see that 0 is a stable 


fixed poinf if and only if 


dV’r.,p(<7^) 




< 1 . 


0-2=0 


Consider a specific fhresholding policy A(cr) = /3cr^ p, where /3 > 0 is a fixed number and define 


^ ^E{r]p{X + aZ-, Pa^-P) - Xf. 


We fhen have 


dipT^picr"^) 


da'^ 


= li.„ tiXfE < ,i.„ fss^. 


0-2=0 


0-2^0 (7^ 


0-2^0 


(57) 


(58) 


(59) 


where fhe last inequality is due to the fact that A* (or r*) is the optimal thresholding policy and hence V'r.,p < '4’i3,p 
for every /3 > 0 and cr^. Since ( |5^ holds for every /) > 0 we have 


d'pT,,p{(T^) 


da^ 


< inf lim 


(60) 


^2=0 l3>Oa^^o 

Let X ~ (1 — e)Ao + eG, where G is an arbitrary distribution that does not have any point mass at zero. Also, let 
U denote a random variable with distribution G. Then we know 


1 


= ^[{l-e)E{r]p{aZ-,Pa^-P)y + eE{rjp{U + aZ-,Pa^-P)-Uy 


a 


2 r 


<5 L 


(1 - e)E{rjp{Z; P)f + eE(r?p(f7/c7 + Z; /3) - U/af 


where the second equality is due to Lemma Hence we have 

_ 1 


(t2-s.o 6 5 (t2_^o 


= -(l-e)E{r^p{Z-p)Y + - hm E{r^p{U/a + Z-p) - U/aY. 


(61) 


Our next goal is to show that we can interchange the limit and expectation above. Define Vp{u] P) = r]p{u] P) — u. 
So we can wrife 

E{r]p{U/a + Z-p)-U/aY = E{Z + Vp{U / a + Z-p)f 

= l+E{vp{U/a + Z;P)Y+ 2E{Zvp{U/a + Z-p)). (62) 

From Corollary 1^ and 1^ we know \vp{u]P)\ < CpP^. So we can gel lhat {vp{U/a + Z; P)Y < CpP^ and 
\Zvp{U/a + Z; P)\ < CpP^\Z\. We can fhen employ fhe dominaled convergence Iheorem lo conclude lhal 

lim E{vp{U/a + Z; P)Y = E lim {vp{U/a + Z; P)Y = 0, 

0-2^0 0-2^0 

lim E{Zvp{U/a + Z; /3)) = E lim {Zvpipjja + Z; P)) = 0, (63) 

(t2_^0 (t2_^Q 

where fhe second equalilies in fhe Iwo lines above is a slraighfforward resull of Lemma Combining ( |60l ), ( |6T] ), 
( |62l ), and ( |6^ implies fhal 

dtpT.,pier‘s) 


da'^ 


1 - e 


< inf 

.2=0 


E{r^p{Z-p)Y+ - = 


( 64 ) 


So far we have proved an upper bound for fhe derivalive of '0T.,p(o'^) al cr = 0. Our next step is to show that 


dipT^picrY 


da'^ 


.2=0 


e 

> -. 
- <5 
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Note that 


a ^=0 

= lim inf - 
cr-s>0/3>0 0 

1 

> - 


> 


(1 - e)E(r/p(Z; (3))^ + eE{r,p{U/a + Z; /3) - U/af 
lim infj(l - 6)E(r?p(Z;/3))¥(|C/| > ^i) + eE[(r?p(C//a + Z;/3) - U/af • 1(|[/| > ^)] } 

{7^p{U/a + Z-,(3)-U/a)^-irnyfi) 


6 o--s>0 /3>0 


E 


lim inf < (1 - e)E{rip{Z-, (3)) + e- 

cr-S>0 /3>0 ' ' 'l''- 


n\u\>^^) 


(65) 


where fi is an arbitrary positive number that satisfies E(|f7| > /r) > 0. Our next step is to prove that 


E 


lim inf < (1 - e)E{r]p{Z; /3)Y + e- 

cr-s.0/3>0 ' ' ^ 


{rJp{U/a + Z■,(3)-U/aY■l{\U\>^l) 


P(|[/| >/i) 


= e. 


( 66 ) 


Since this requires more work, we postpone its proof until Section VI-H2 and we discuss how ( |55) and ( |56| ) finish 
fhe proof of Theorem By combining ( |65| ) and ( [6^ we obfain 

d'tpT,,p{crY 

0-2=0 




> lim ^P(|f7| >n) = ^. 

, 2 _n ^0 0 0 


(67) 


Combining (|64l) and (^i proves that 


dV’r.,p(o-^) 


dcr^ 


0-2=0 


As we discussed before, 0 is a stable fixed point if and only if 


dipT.,p{(yY 


da'^ 


0-2=0 


= 5<i- 


The only step that is still unresolved in the proof of Theorem]^ is (^l. Since the proof is different for 0 < p < 1 
and p = 0, we prove them in two different sections below, i.e.. Section [VI-H2| and |VI-H3| respectively. 


2) Auxiliary result for 0 < p < I; As we discussed before our goal in this section is to prove Equation 
for every 0 < p < 1. Below we prove a stronger result, since this stronger version will be used in other proofs 
throughout the paper. Define 

Rp{T,cr) = (1 - e)Ep2(2';r) + eE(pp(f7/cr + Z;r) - U/af, 

where Z ~ A^(0,1) and U ^ G are independenf. Denote the optimal r that minimizes Rp{T,a) by r*(iT). Also, 
let a: ~ (1 - e)Ao + eG, and define Eg(/((/)) = f f(u)dG(u) and Pg(U e A) = EG(I(f7 G B)). 

Proposition 4. Suppose Pg(|( 7| > p) = 1 with p being a fixed positive number and Eg\U\^ < oo. Then, for 
0 < p < 1, we have 

-Rp(A(cr),cr) = e + ep^E|[/p^“^(n((T))^(T^“^^ + o((n((T))^cr^“^^), 
where the convergence rate ofT^{a) can be characterized by 

^ (l-e)cpPp(cp;l) 

(A(a))^(/.(cp(r*(a))^) ~ ep2(2_p)E|C/|2p-2- 

Before we prove this result note that as ct —)• 0, Rp{T^{a),a) —)• e, and this implies ( [6^ we required to prove 
Theorem In this proposition we go one step further, and characterize the second dominant term as well (in terms 
of a), since it will be used in the proofs of other results later in our paper. 
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We prove Proposition in three steps. We first show T^{a) goes off to infinity, hut not very fast, as cr —)■ 0. 
This will he done in Lemma Then, we characterize the exact rate of r* (cr) in terms of cr —)• 0. This will he 
performed in Lemma 16 Finally we use this result to prove Proposition 


Lemma 15. Suppose E|Xp < oo, then for 0 < p < 1, r*((T) —)■ oo and r*(cr)fT^ ^ 0, a —)• 0. 


Proof: If T^{a)a'^~P 0, then there exists a sequence crfc —>• 0 and a constant c > 0 such that > c, 

for all k. Choose a convergent subsequence {cr^^} and denote lim^^^oo = a > c (note a can he +cx)). 

We use Fatou’s lemma to get 


lira ini E{pp{X + a k„Z;al ^n{akj) - X)^ > Eliminf{pp{X + ak^Z; ^n{akj) - Xf 

kn^OO ^ kn^OO "■ 


r‘^-Pn 


> Emm{{r]p{X;a)-Xf,X^)>0 


Hence, we have 


liminfE(7/p(X/crfc„ + Z;T*(crfc„)) - = lim ^ • liminf E(r/p(X + a? ^r*(c 7 fc„)) - X) 


kn^OO 


kn^OO 


= +00 


which implies liminffc^_^oo Rp{'P*{^k^)-i^k^) = + 00 . However, since is the optimal thresholding value, we 

know Rp{T^{ak^),ak^) < Rp{i),akf) = 1, for every /c„. This is a contradiction. Similarly, if t^{cj) -»■ 00 , there 
exists a sequence < 7 ^ —)• 0 and a finite constant a > 0 such that T^{ak) —>■ a. By similar arguments as in the 


previous proof (see (64i for example), we can apply dominated convergence theorem to obtain. 


lim Rp{n{cFk),CFk) = (1 - e)Eril{Z;a) + e > e. 


k^oo 


( 68 ) 


On the other hand, since T^{<7k) is the optimal thresholding value, we know 

lim Rp{n{ak),crk) < lim Rp{/3,ak) = (1 - e)E 77 p(Z;/?) + e, 


k^oo 


k^oo 


for any finite (5. Letting /3 —)> 00 on both sides of the above inequality yields 

lim Rp{n{(Jk),crk) < €, 
k^oo 


which contradicts ( 681 . 


Lemma 16. Suppose > Z^) = 1 with p being a fixed positive number and Eg| C/p < 00 , then for every 

0 <p <1, 

{l-e)cp{r]+{cp-,l)f 


lim 

(7^0 


f{cp{n{a))^p) ep2(2-p)E|C/|2p 2 ' 


Proof: We recall some properties of the proximal operator pp{u; X) that will be used multiple times in the 
proof. For further information, see the proofs of Lemmas [7] and 

(a) ^ ^ 


i+^p(p-i)pf (w;A)’ 

^ , for u > CpA^. 


X dpRu-p) _ _ 

'' du 1+Ap(p—1)»7 p“^(u;A) ’ ^ 

(c) u — Pp{u] A) = pXpp~^{u] A), for u > CpX^. 

Let F{u) denote the CDF of \U\. We first decompose Rp{T,a) to the following terms: 

POO POO POO 

Rp{T,a) = 2(1-e)/ Pp{z-,T)(j){z)dz + e / 

Jc„r‘^-p J u J—I 


POO POO 

' fL Ju/a+CpT^' 


f fL J—uf<J+CpT'^- 


1 {pp{u/a + z-,t) — u/a)‘^(l){z)dzdF{u) + 


POO P 

^_{r]p{-u/a + z]t) + u/a)‘^f{z)dzdF{u) + e / / 

^ J U J - 


OO p—ula-\-CpT‘^-p ^2 


^ -^(f){z)dzdF {u) 
p J —uja—CpT'^-p ^ 


— i?l + i?2 + Rti + Ri 


(69) 
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From the proof of Lemma 15 it is straightforward to see that r*(cT) is non-zero and finite. Since r*(cr) is the 
optimal thresholding value and^^^^^^ is differentiable (according to Lemma 131, we conclude that r*((T) satisfies 
dRp(T^(a-),a-) _ Q nofafional simplicify, helow we use r and r* inferchangeahly. Now we analyze fhe parfial 
derivafive of fhe four ferms in ([69l) separafely. For fhe firsf ferm, 


dRi — 2(1 — e)cpT^-p 


2 — p 


, n 

(V(cpT 2 -p;r)) (/)(cpr 2 -») - 4(1 - e)p / ^ - 

Jc„T^-P 1 


Vpiz-,T) 


Cpr 2 -P l + Tp{p-l)rf^ 


-(l){z)dz, (70) 


where we have used property (^ We now compare the order of the two terms on the right hand side of the above 
equality. According to Lemma ^ we can conclude that 1 + Tp{p — l)r)p~‘^(z;T) is bounded away from zero, for 
z > CqT^. Hence, combining with the fact \rip{z]T)\ < \z\ (according to Lemma |^, we know there exists a 
positive constant C such that 

poo 

Jcr,T^-P 1 


r]piz-,T) 


- 4 >{z)dz 


+ Tp{p-l)p^ ^{z-r) 

< C f ^ z^(j){z)dz = Cc‘^~^T^(p{cpT^) + C I ^ {p — l)z^~‘^4'{z)dz 


roo 


fCr,T^-P 


< 


-1 1 

Cc^~^T^(j){cpT^) + C{p - ^ (j){z)d. 

J Cr,T^-P 


c„t2-p 

'z 


p— 1 1 p —3 1 p— 1 1 

< 0{t^-p (t){CpT'^-P )) + 0{t^-p 4>{CqT^-P )) = 0{t^-p (j){CpT^-P )). 

To obtain Equality (i) we used integration by parts. To obtain Inequality (ii), we have used (j){z)dz i</>(i)- 
as f —)■ cx). Now we discuss the order of the first term in ( fTO] ). Since according to Lemma 3 rj^{cpT^-,T) = 

1 p+1 1 '—I 

T^-pr]j{cp] 1), we know the first term is of order r^-p 4>icpT^-p). Hence, we can conclude that 


dRi p+i ^ -2(l-e)cp(??+(cp;l))2 


(71) 


For the last term R 4 , we can do the following calculations: 


dRi 

Ih 


= E 


ell'^CnT^-P 


0-2(2 -p) 


■{(t){-U/a + CpT^-p) + 4>{-U/a - CpT^-p)) 


< 


P-1 

2 ecr,r2-p 


- rr2 


0 - 2(2 — p) 


E[U‘^(j)(cpT^ - \U\/a)]< 


P-1 

2eCpT'^-p /_\TLr.r 7-2 


0 - 2(2 — p) 


(l){cpT'^-p — p,/a)EU^, 


where the last inequality is based on |C7|/o- > pja » CpT^-p from Lemma 15 Again using \p\/<7 S> CpT^-p, it is 
straightforward to confirm fhaf 

lim '»(-P/P + Y^) ^ 0 . 

cr-5-0 a'^(j)(^CpT^-p) 


Therefore, we have 


, dRi ,, p±i ,, ^ X, 

lim —— /(T^-p oicyT^-p )) = 0 . 
dr ^ 


( 72 ) 
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We now discuss the calculation of We have 


dR2 

dr 


J {r]^{cpT^-p-,T) - u/a)‘^(l){-u/a + CpT^-p)dF{u) + 


/:/: 
/:/ 


2 e 


' fi J—ula-\-CpT^-p 
r*oo poo 


1 + z;t) — u/a — z)d2r]p{u/a + z] T)(p{z)dzdF{u) 


CCrt P — 


/i J —ujcr+Cpr^ 
oo 


1 zd2r]p{u/a + z‘,T) 4 >iz)dzdF{u) 


2 — p 

2eTp‘ 


a /■“ n ^ 

/ (Pp icpT^-p;T) - u/a) (l){-u/a + CpT^-p)dF{u) + 

J u 

r]P’~‘^{uja + z;t) 


'IJ' 
r*00 poo 


a 


p J-u/a+c^T^-p l + Tp{p-l)r]^ “^{u/a + z\t) 


(f){z)dzdF{u) + 


—2ep 


/:/ 


ri^ \u/a + z;T) 


At J-u/a+c^T^-P I + Tp{p- 1)7]^ '^{u/a + Z-,T) 


z 4 >{z)dzdF{u) 


^ Si+ 82 + 83 . 


We have used properties (a) and (c) in the above derivations. We then analyze the above three terms separately. 
For S3, integration by parts combined with property (b) gives 


^3 = 


-2ep(r/+(cpr2-t>;T))P 


-1 


1 + Tp{p - l){r]^{CpT^-P-,T))P 


roo 

/ (l>{-u/ 

J u 


roo poo 


-2ep{p - 1 ) 


+2ep^(p - 


—u/a + CpT^-p )dF{u) 

_ r]p~^iula + z;t) _ 

J-u/a+CpT'^ {l + Tp{p-l)r]^~‘^{u/a + z-,T)y 

ril^~‘^{u/a + z\t) 


POO PC 

{p-1){p-2)t / 

J u — 1 


'll J-u/a+CpT^-p {I + Tp{p - l)r]^ ‘^(u/a + z;t))^ 

Choosing a positive constant 0 < v < p, note that 


4 >{z)dzdF{u) 

(l>{z)dzdF{u)=Ti+T2 + T3. 


T 2 


(T‘ 


^ = -2ep(p-l) f 

J U 

—2ep{p — 1)E 


00 ,-u/a+v/a 


II J-u/a+CpT^-P {I + Tp{p - l)r]f, "^{u/a + z-,t)Y 

\{Z + \U\/a > V/a)'iql~‘^{\U\ + aZ] a‘^~PT) 


4 >{z)dzdF{u) 


(1 + Tp{p - l)r]^ + Z-,T)f 


(73) 


It is straightforward to check that when u > CpT^-p, there exists a positive constant Cq such that 1 + Tp{p — 
t) > Cq > 0. Also since rjp{u-, r) is a non-decreasing function of tt > 0, we can have 


1 {Z + \U\/a > v/a)rjp ^(|f 7 | + crZ; cr^ ^r) 


(1+ rp(p-l)ry^ ^(|C/|/cr + Z;r))" 


< iC^o)-Vp-\v,a^-PT) < C^%Pp-\v;l), 


for sufficiently small a (recall r]p{u\ r) is a non-increasing function of r when r]p{u; r) > 0). Because —) 

0, as cr —;■ 0 from Lemma [T^ we can easily see 

1{Z + \U\/a > v/a)7%~‘^{\U\ + aZ] a'^-Pr) ^ l(o-Z + \U\> v)'n^~^{\U\ + aZ; a'^-Pr) ^ |Anp-2 


lim 


(1-|-Tp(p — l)77p + Z;T)y <^-^0 (1-|- PTp{p — l)r]p ‘^{\U\ + aZ-,a‘^ 

We can then use dominated convergence theorem to conclude, 

,1(Z -I- |f7|/cT > v/a)r^~^{\U\ + aZ-,a‘^~PT) 




lim E- 


^^0 ( 1 -hTp(p-1)7/^ ^(|[/|/(T-hZ;r)) 


= E\U\ 


p-2 


(74) 
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Moreover, we can use similar arguments to obtain, 

OO n-u/a+vla ^p-2^p-2 j^ ^ 


Ifj, J-u/a+c^T^-p {l + Tp{p-l)r]'^ (u/a + z-r)) 


-(f){z)dzdF{u] 


< Cq^t \rj^{cp-,l)y 2 


bCXD p — ula+vl<J 


' —ujo-y-CpT 2-P 


^ (j){z)dzdF{u) 


< Cq'^t ^(r/+(cp; 1 ))p '^{v/a - CpT^-p)(l){-p/a + v/a) ^ 0 , as a 0 , 


(75) 


where the last inequality uses the fact that f ^ (f)(^z)dz < {vja — CpT'^-p)(l){—ula + vja) and that u > p. 

—uju-\-CpT'^~'P ^ 

Combining ( |7^ , ( |74| ) and (751 we have 

lim = -2ep(p - 1)1K\U\p~^. 

(T-s>o P 

Since T 3 and S 2 admit similar integral forms as T 2 ’s, we can follow similar calculation steps to derive, 
lim 


-n Sf- = - 1)(P - lim = 26p2E|C/|2p-2. 

(j^o a* o--s>o (T^ 


Furthermore, by applying Lemma [T^ it is not hard to see 


(T-s-O (T^ P 

Combing the results about Ti,T 2 and T 3 , we have 

lim = lim , + lim —h lim 


lim = 0 , lim = 0 . 


a^o cr^ P 
Ts 


—^-O CJ^ P IT—^0 (T^ P cr—>-0 P cr—>-0 ‘^P 


= -2ep(p- 1 )E|C/|p-2. 


Putting (76 1 , (77 1 and (|78|) together, we obtain the order of 


dRi 

dr 


lim = 2 ep 2 E|C/| 2 p- 2 . 

(T^O OT 


(76) 


(77) 


(78) 


(79) 


From Equation ( [691 ), we observe that R 3 is only different from R 2 by a sign of u, hence we can follow the same 
derivation strategy as the one presented for analyzing dR 2 /dT. We only highlight the differences for calculating 
T 2 /a‘^~P (we are using the same notations): 


^\ n™ HZ-\U\/fT>v/a)ril 2(-|r/|+o-Z;cr2 pt) 

i) iimo-_^o - I irri I 7._^^2 - = U, 


2 ) 


{l+Tp{p-l)r]^ ^{-\U\/(t+Z-,t)P 

rcZF 


<Co^r ^(cp;l)fjP '^{v/a-CpT^-p)(p{p/a+ 


CpT^-p) = 0 ( 1 ). 

Therefore, we can conclude limCT->.o = 0. Similar arguments hold for other integral calculations. We finally 
obtain 

dRs 


lim 


-liZ-^PZ = 0 . 


o--s>o dr 

Collecting the results from ( |69| ), ( fTT] ), ( [72] ), ( |79| ), and ( |^ , we achieve 


( 80 ) 


lim a 2 - 2 Pr* 2 ep 2 E|[/| 2 p -2 

(j^O 


in)Z (P(cpiTy^-p )2{1 - e)cp{r]+{cp]l))" 


-1 


After a simplification, we reach the conclusion 

^ 2 - 2 p 

lim 


2-p 

(1 -e)cp(r/+(cp;l ))2 


= 1 . 


CT^O 


{n)ZpHcp{n)^y epy2-p)E\UZ 2 • 
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Lemma 17. Suppose P^dC/l > p) = 1 with p being a fixed positive number and EgIJ/P < oo, then for 0 < p < 1, 

Rp{n{a),a) = e + + o((t*)^c7^“^p). 


Proof: We will use the same notation that was introduced in ( |69l ), and we analyze i2i, i? 2 , and R^ separately. 
Regarding i? 2 ^ we have 

poo POO 

R 2 — e = e I / ^ {pp{u/a + z-,t) — u/a — z)‘^(f){z)dzdF{u) + 

'/i J—uf cr-\-CpT^-P 

1 

POO p—uja+CpT'^-p 

^ 2z{pp{u/a + z;t) — u/a — z)(j){z)dzdF{u) — e / / z^4>{z)dzdF{u) 


POO PC 

y y 

POO PC 

y y 

^ /i —-i 


I fiL J —uja+CpT'^-P 

— Qi + Q2 F Qs 


I fl J —00 


By property (c) listed in the proof of Lemma 16 we have 


POO PC 

Qi = ep\^ / / 

J LL J —'i 


^ PpP ‘^{u/a + z;T)(j){z)dzdF{u). 


I fj, J—ufa+CpT^-P 

Using the same arguments (see the analysis of T 2 ) as in the proof of Lemma [T^ it is straightforward to show that 

Qi 


lim „ „ „„ 

(T-5-0 T^a^-^P 


= ep^E\U\^P-\ 


Regarding Q 2 , using integration hy parts and property (h) given at the beginning of the proof of Lemma 16 
obtain 

, _L _L /■“ 

Q2 = 2e{pp{cpT'^-^]T) - CpT^-p) j cj){-u/a + CpT'^-p)dF{u) - 

J fL 

Tp{p - l)p^~‘^{u/a + z; r) 


we 


/:/: 


2e 


11 J-ula+CpT^-p l + Tp{p-l)p^ ‘^{u/a + z;t) 


-(f){z)dzdF{u). 


We can directly see the first term on the right hand side of the above equation is bounded by 0{t^^{ p/(2a))). 
By using the same technique applied for analyzing T 2 , we then know the second term is of order Ta‘^~^. Hence, 
we have 

lim= 2 ep(l -p)E|[/|P“2 . 

cr-5>0 TCT^ P 

We now analyze Q 3 . A simple integration by parts yields, 

Q 3 = —e 


poo ^ ^ POO POO 

i (u/a — CpT^)(j)(u/a — CpT^)dF(u) + / / ^ 4>{z)dzdF(u) 

fL J fL J Uj (T — CpT'^-P 


Using the fact that (j)(z)dz ~ \(t>{t) and p/a — CpT^-p —)> + 00 , we can derive 


poo 


(u/a — CpT^-p)(j)(u/a — CpT^-f)dF(u) < / (u/a)<f)(u/(2a))dF(u) < <f)(p/(2a))E\U\, 


Fj) I ^ J Kjy \^Uj j yJ C-p 

ffl Jfl 

POO POO POO 


f 


^ 4>(z)dzdF(u) < / ^ (j)(z)dz < 0{l/{p/a — CpT'^-t‘)(j){p/a — CpT'^-p)). 

Ifi Juja-CpT^ dfi/a-CpT^ 


It is then straightforward to confirm that 


Qs 

lim —^ = 0. 


o—lO rcT^ P 

Combing the results of Qi, Q 2 and Qa, we obtain 


lim ^ = ep^E|[/p^ 

tr^O T^a^-^P 


( 81 ) 
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Because of the minor difference between i ?2 and (the sign of u, see more explanations in the proof of Lemma 

(82) 


161 , it is not hard to get 


lim ^ ^ = 0 . 


O'-5-0 

Regarding R^, we first derive an upper bound in the following way: 


7^4 = e 


POO p — ul(J+CpT^-P ^2 


' jjs J—uja—CpT'^-p ^ 


—^4>(z)dzdF(u) 

-i- /T^ 


< 2 eCpT^-pa ^ / u^(j){—u/a + CpT^-p)dF{u) < 2 eCpT'^-pa ’‘^(j){—^/a + CpT^-p)¥J\U\ 


- 2 , 


Since 0, as cr —)• 0, we have 


lim ^ ^ = 0 . 


cr-5-O 

We fianlly analyze Ri. A simple integration by parts proves 

roo roo 


(83) 


roo 

2(1 -e) ^ ril{z] T)(l){z)dz = -2(1 - e) 

JCryT'^-P 


- 2 ^ ^ 


d 4 >{z) 


= - 2 ( 1 -e) 


Vp{z-,T) 


(t){z) 


c„T^-p 


+ 2(1 ~ ^) f 

- 1/ Co 


2zr]p{z-,T)dir]p{z-,T) - r]‘^{z;T) 


4>{z)dz. (84) 


C^T^-P 


Since \rjp(z;T)\ < \z\, for the second integral in ((84|) we have 




2 zTjp{z-, T)dir]p{z; r) - ril{z-, r) 


4 >{z)dz 


(j){z)dz 


+ 


Jc^t2-p \l + Tp{p-l)r^ “^{z^T 

roo (1) roo o 

/ 4 >{z)dz < ^ ■^ 4 >{z)dz + / (t>{z)dz < { 2 C~^ + 1) / 

J C„T‘^~P J C„T‘^~P ^ J C^t'^~P J Cr,T‘^~P 


< 0{tp-^ (l){cpT^-p)), (85) 

where (1) is due to Lemma Hence the dominant term in ( [84l ) is the first term. More specifically, we have 


Ri 2 (l-e)(r/+(cp;l))^ 

lim ■ ^ ^ = —- —r 

T^-p (j){CpT'^-p) 


( 86 ) 


Putting the results from ( [M] l, ( [82l ), ( [ 8 ^ , ( [ 8 ^ , and Lemma [T^ we can conclude 

Rp_{Fj>^)-S ^ ep2^|c/|2p-2 
rV2-2p 


□ 


3) Auxiliary result for p = O.' In this previous section we characterized the risk of Rp{T^{a),a) for every 
0 < p < 1. The bounds we derived and the analysis we provided are not correct for p = 0. In this section 
we derive the corresponding expansion for p = 0. Similar to the previous section consider two random variable 
A ~ (1 - e)Ao + eG and ~ G, and define EcifiU)) = J f{u)dG{u) and Fg{U £ A) = EG(I(t/ G B)). 

Proposition 5. Suppose E|(7p < cx) and E(|(7| > p) = 1, where p = sup^{u : E(|(/| > u) = 1} > 0, then for 

p = 0 , 

Rp{n{cr),a) = e + o{4>{fla~^)), 


where p is any constant that smaller than 

The roadmap of the proof is similar to that of Proposition We characterize the convergence rate of t*((j) 
and derive the asymptotic formula for i?o(A(o'), a) in Lemma 18 and Lemma 19 respectively. Propositionthen 
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follows directly by combing the results of these two lemmas. For the sake of brevity, we will skip some calculation 
details. 


Lemma 18. Suppose E|[/p < oo and P{\U\ > p.) = 1, where p = sup^{u : P{\U\ > u) = 1} > 0. Then for 

p = 0, 

\\mJn{a)a = 

(T—>-0 ZCq 

where cq is the constant Cp with p = 0 introduced in Lemma 


Proof: By using the same arguments presented in the proof of Lemma 15 we can obtain r* (cr) —oo, as a —)• 0. 
Now we consider an arbitrary convergent sequence cjfc —)■ 0, as /c —)• oo, and show \/T^{ak)crk p/{2co). Denote 
limfc_^oo y /= CK- For notational simplicity, below we use exchangeably r and r*. Suppose a > p/cq, then 
by Fatou’s lemma, we have 

liminf E(r7o(i7/crA; + Z-, r*) — U/akf > liminf E[1(|C/ + akZ\ < CQ^/Tfak)U‘^ / crl] = oo. 

k^oo k^oo 

On the other hand, RQ{T^:,ak) < Ro{0,ak) = 1. This is a contradiction. Hence we get a < p/cq. Next we aim to 
show a < pI{2cq). Due to the explicit formula pq{u]t) = m1(|u| > coy/f), it is straightforward to derive 


Ro{T,a) = 2(1-e) co\/r(/>(co\/r) + j (j){z)dz + eE (^co\/t -+ j 


+eE 


a 


rPm)^r 

a J Jco^+m 


+ eE 


/ 




|[/|2 

— !^4>{z)dz, 


(l){z)dz 


— i?l + i?2 + P 3 + Pi- 


(87) 


Moreover, it is straightforward to show that r*((Tfc)> the optimal thresholding value, is finite and non-zero, and 
hence we have 9Ro{r,i^k),a-k) _ j 


dRi dR2 dR3 dR4_ 
dr ^ dr ^ dr ^ dr 


( 88 ) 


where we know 0 


dRf , 1^3 j — If /~\ dR2 

^ = (f - l)c„^/?0(c„/F), ^ = 


M)V(c 


(Tfc / V 

Ok > 


dR. 


-eco 


dr 2^ 
dRi eco 


E 


MV 


(7k ^ ^ O'k 


Sr 

A few more algebra calculations yields. 


^k 


rPP 

^k 


dRo dR^ dRi —ect 


+ 


+ 


dr dr dr 2ak 


E 


-ecn 


2 <Tfc 


lE 


TCTk + 


(^Coy/rcrfc - 2\U\)4>(^CorfT - 


\U\(t)(co^/T 


T + 


\E1) 

CTk 7 


—E 
O'k 


+ 

\U\ 

^k 


where the notation ~ indicates that they have the same orders in terms of au 0. Hence, dividing both sides of 
Equation ([ 88 ]) by ^/T(j){co^/T) and letting k ^ oo shows 


0 < lim E 

k^oo 


\U\exp(^ 


\u\i\u\ - 2akCoVf) \ 


-H 




< oo. 


(89) 


'"^The condition E|Fp < oo enables us to apply dominated convergence theorem to exchange the differentiation and expectation in the 
calculation of the partial derivatives. 
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If a > /i/(2co), then we see 


E 


|f7|exp(^ 


|f/|(|f/| -2afcCoVE) x 


-H 




> E 


|[/|exp(^ 


|f/|(|f/| -2cTfcCoVE) ^ 


- 2 ^i 


1{\U\ < 2aco) 


+ 00 . 


We have used Fatou’s lemma to obtain the last limit. Obviously the inequality above contradicts ( [89| ). Thus we obtain 
an upper bound /r/(2co) for a. Finally we would like to derive a > /i/(2co). First note that since a < /i/(2co), it 
is not hard to confirm that when k is large, 


ORa ^ ecoE|C/p 
dr ~ Jral 


T — 




= o 


1 


rat 


T — 




Based on the inequality above, we can further obtain 

dR2 dRs ORa 

+ ^ + 


dr dr dr 
Now suppose a < p/(2co), then it follows that 


<o 


rat 




T — 




(90) 


1 


/Fcj; 


T — 


1 


1 ( - 2coakV^)\ 

- )=»(!)■ 


(^k’ V^4>icoVr) Tal X -^^k 

However, this fact combined with ( [QO] ) implies that if we divide Equation ( [ 88 ] ) by letting /c —)• cx), 

we would get 

(e - l)c|] = 0 , 

which is a contradiction. Therefore, we have showed that for an arbitrary convergent sequence ak —)• 0, we have 
y/rjjy^ak /i/(2co), as A: —)• oo. This completes the proof. ■ 

Lemma 19. Suppose E|[/p < oo and P{\U\ > p) = 1, where p = sup^{u : P{\U\ > u) = 1} > 0. Then, for 

p = 0 

-Ro('r*(cr),o-) = e + 0 (v^(?i(coVn)), 

Proof: We use the same notations from the proof of Lemma [T^ Then, 

Roi^*^ ~ Pi T (R 2 — f) T P 3 T P4- 

Using the fact that ^r*(cr)cj —)■ ^ according to Lemma 18 from ( [87] ) we can easily obtain 

Ri = O(y/nf(co^yn)), R 2 - e = OIE 


M, 

L CJ 


n - 


\U\ 


a 


, i?3 = 0 E 


M 

a 


T* + 


\U\ 


a 


Regarding Ra, we have 

Ra = eE 

< eE 


|C/|^ 






POO 

j-OO 

f{z)dz - 

/ f{z)dz] 


P-ff+coVr: > _ 






(j){z)dz 


< eE 


vH-- 


a 


COy/T, 


\U\ 


\U\ - CQUyflf 


< O E 




Furthermore, from (|89j) we can see 


E 


CoV"^- 


1 


-^/ry <?i(co-^/fy) 


= 0 ( 1 ). 


o \ <7 

Putting together what we have derived so far shows 

Ro(n(a),a) - e = 0(^/nf(co^/n)). 
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I. Proof of Proposition 

In order to prove this proposition, we require several preliminary results. Define 

+ o-Z; (3cr) - Xf, 

where /3 denotes a fixed number greater than zero. 


Lemma 20. [411 For every /3 > 0, ® concave function of a^. 

A simple corollary of this result is that has a unique stable fixed point. Refer to |411 for more information 

on this lemma. 


Lemma 21. Let X ~ (1 — e)Ao + eG with e < 1. Let A*(cr) denote the optimal thresholding policy for f’l-AMR 
Then, 

n T- 

0 < lim - < oo. 


O-2-5.0 a 


Proof: We start by assuming lim^a-^o 
/3 > 0 , we have 




exists. We first show that the limit is not zero. Note that for every 


< 1 . 


(91) 


This is due to the fact that A*((t) is the optimal thresholding policy and outperforms all the other thresholding 
policies including A(cr) = fda for a fixed j3. Define T*(cr) = . Our goal is fo show fhaf if r*((T) —)■ 0 as a —)• 0, 

then the ratio specified in ( [9T] ) will be larger than 1 for all the /3 around zero which is in contradiction with ( |9T] ). 
Note that 

.. E(r/i(X + crZ;r*((T)fT) - X)2 _ E(? 7 i(X/cr + Z; r*(a))-X/o-)^ 

^2^0 E(r/i(X + aZ;/3(T)-X)2 “ E(? 7 i(X/a + Z;/3) - X/c7)2 ’ 

Consider a random variable f7 ~ G. Then, ( |92l ) can be simplified in the following wayf^ 

,. ^A.,i(o'^) ^ ,. e&{r]i{U/a + Z;n(o-)) - U/af + (1 - e)E(??f(Z; n((j))) 

^^,i(ct 2) eE(r?i([//cT + Z;/3)-C//a)2 + (l-e)E(r/2(Z;/3)) 

1 


(92) 


6(l + /32) + (l-e)E(r?2(Z;/3))’ 

where the last equality is due to the assumption that r* (a) —)• 0 as a —)■ 0. Note that for /3 = 0, the numerator and 
denominator will be the same and hence the ratio is equal to one. However, a simple calculation shows that 


—e(l + /3^) + (1 — e)E{r]l{Z; (3)) 


f}=o 


poo 

-4(1 — e) / z4>{z)dz < 0. 

Jo 


Hence, for (3 in the neighborhood of zero, the ratio in ( [93] ) will be greater than 1, which is in contradiction with 
(ED and consequently r* (cr) ^ 0 . 

We now discuss the other part of the lemma, i.e., the proof of 


A* (a) 

lim - < oo. 

cr2-i,o a 


As before, define r*(iT) = 


A A.((t) 


and consider a random variable U ^ G. From the derivation in (931, we know that 


oo > lim ^ = lim eE(r/i(f7/cr + Z; r*((T)) — C/cr)^ + (1 — e)E(7yf(Z; r*(iT))) 

a ^—^0 cr ^ 0 - 2—>0 

> lim eE(r/i(f7/cr + Z;r*((T)) — f7/cr)^. 


(94) 


'^We have used dominated convergence theorem in these calculations. It is straightforward to prove that the conditions for this theorem 
hold. But, for the sake of brevity and since we have studied similar problems in the proof of Theorem we do not check the conditions 
here. 
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Suppose T^{a) —> oo. Since 771 (u; A) = sign(u)(|u| — A)+, we can easily see the following, 


{T]i{U/a + Z] T*(cr)) - > min{(Z - r*(cr))^, {Z + n(cr))^, f/^/cr^} —)• + 00 , as a 

Hence, by Fatou’s lemma, we conclude 


lim E(77 i(C//(T + Z;r*(cT)) — f7/cr) 

cr^-s-O 


CX), 




which contradicts Inequality 

So far, we have proved if lim^a-^o 
convergent sequence cr„ —)• 0 such that lim, 
sequence. Hence 0 < a < 00 . Similar to 


exists, it must be a finite non-zero number. Now we consider any 
= a. Then all the arguments presented before work for the 


■n^oo Q 

we can obtain 


1 > lim 

n—>-oo 


e(l + + (1 - (E’; a)) 


(95) 


e(l + /52) + (1 - e)E(r}f(Z; /?)) ’ 

for any /3 > 0. It is straightforward to confirm fhaf e(l + /3^) + (1 — €)E(r/f(Z; /?)), as a funcfion of /3, is sfricfly 
convex and has a unique minimizer over [0,cx)). Denofe fhaf global opfima by /3*. If we choose /3 = in ( |95] ), 
we can immediately conclude a = / 3 *. Since we have been discussing an arbifrary convergenf sequence, if implies 
fhaf limCT 2 _s.o = /3^. This completes fhe proof. 

Wifh fhis background information, we can now prove Proposifion 

Proof: For simplicify, we only consider fhe noiseless seffing in fhe proof. The uniqueness of fhe fixed poinf in 
fhe noisy case follows similar argumenfs. We sfarf proving fhe uniqueness by confradicfion. Suppose fhaf T'a.,! 
has fwo fixed poinfs 0 < erf < erf. Define /3* = and consider a new fhresholding policy A((t) = /3*er. 

According fo Lemma 20 we know fhaf has only one sfable fixed poinf. Thaf fixed poinf is clearly 


eri. Therefore, ?/;/ 3 .^i(er^ < for every er^ > erf. Now since 'I'A.,i(cr 2 ) = ^2 ^2 > *^ 1 ’ conclude fhaf 

7 / 7 / 3 ^(erf) < 'I'A,,i(erf). This is in confradicfion wifh fhe facf fhaf A*((t) is fhe optimal fhresholding policy. Therefore, 
'I'a,,i has af mosf one fixed poinf above zero. 

If zero is nof a sfable fixed poinf, fhen according fo Proposifion [T] rkA.,! has af leasf one non-zero sfable fixed 
poinf, hence if has a unique sfable fixed poinf above zero. Finally, we show fhaf if zero is a sfable fixed poinf, fhen 
'I'a,,i does nof have any ofher fixed poinf. Define Note fhaf according fo Lemma 


21 


0 < lim r*((T) < 00 . 
0 - 2 -rO 


Lef limCT 2 -s.o A(fr) = /3* and U ^ G. Then we have 


d^K,iicr‘^) 


da"^ 


0 - 2=0 


= lim 

O-2-S.0 




cr^ 


= - lim eE{rii{U/a + Z]T^{a))-U/a)"^ + {I - e)E{rjl{Z-,n{a))) 


6 (l + (/3*)2) + (l-e)E [r,l{Z-n) 

6 

min/3>o e(l + + (1 - e)lE /3)) 


(96) 


Nofe fhaf fhe lasf fwo equalifies above can be obfained from fhe argumenfs in fhe proof of Lemma 21 If 0 is a 
sfable fixed poinf, fhen 

(iT'A.,i(cr^) 


dcr^ 


< 1 . 


0 - 2=0 


If is sfraighfforward fo confirm fhaf ^ is the same as the derivative of 'ip/ 3 »^i{cr^) at zero. However, 

since 7 /)^._i(cr^) is concave and its derivative at zero is less than 1 , it will not have any other fixed poinf and 
V^/ 3 *,i(f^^) < for every cr^ > 0. Hence if '1 'a,,i(o'^) has anofher fixed poinf af ctq > 0, we conclude fhaf 


iti:!) 


< ^A.,i(fro), 


which is in confradicfion wifh fhe opfimalify of A*. 
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It is now straightforward to characterize the phase transition of the optimal-A £i-AMP. Note that according to 
our discussion, = 0 is the unique fixed point if and only if 






< 1 . 


2=0 


Combining (|96|) and (97 1 finishes fhe proof. 


(97) 

□ 


J. Proof of Theorem 

Lef denofe fhe smallesf value of a af which ^ is equal fo one. If if does nol exisf, we sef d = oo. 

According fo Theorem fhe derivafive of af = 0 equals fo |. Since e < 5, we conclude fhaf 

^ ^ ^ every a‘^ < Define ctq = {ao = oo if d = oo). Note that (Tq > 0, since 

'I'a,,p( 0) = 0 and the derivative of 'I'a.,p(ct^) is less than one for every cj^ < d^. Our next step is to show that for 
every cj^ < (Tq, the equation 

^2 = ^ 2 + 


has one solution in [0, d^]. Define r((T^) = cr^ — 'Pa.,p(c 7^) — Nofe fhaf r(0) < 0 and r(d 2 )> 0 . Furfhermore, 
if is sfraighfforward fo see fhaf fhe derivafive of r((T^) is posifive and hence if is an increasing funcfion. Thus 
— 'Fa.,p(o'^) “ = 0 has exacfly one solution in the range [0,d^]. This is the lowest fixed point of = 

(^w + ^A,,p(<7^). By employing the implicit function theorem, we conclude that 


daj 1 

da^ 1 _ <^^a„,p(o-2) 

^ do-2 


(98) 


Therefore, a'j, as a function of cr^, is differentiable and has finite derivative for any cr^ < Uq. According to Theorem 
1^ e < 5 and cr^ = 0 leads to cr| = 0. Hence, the continuity of aj implies that 

lim ct| = 0. (99) 

^0 


Combining ( |9^ and ( [99] ) we conclude that 
lim = lim 


2->0 da^ I _ <^'t'A.,p(o-2) 


do -2 


G = Gi 


where the last equality is from the proof of Theorem 


1 


1 - 


'i'I'A.,p(o-") 

da'^ 


C7=0 



e ’ 

5 


K. Proof of Theorem 

This proof is essentially a combination of the results we obtained in the proofs of Theorem and Proposition 
1^ Note that as we proved in Proposition the stable fixed poinf of 


is unique and we have used fhe nofafion cj| fo refer fo fhis unique fixed poinf. Moreover, since Mi(e) < <5, we 
know = 0 when = 0 from Proposifion Similar fo (^i, we have 


daj 

dal, 


1 - 


d^A.,l('A2) 

do -2 


( 100 ) 


Finally, we already know from Proposition fhaf 
<9^'a.,i('7^) 


da"^ 


0 - 2=0 


= inf e(l + a^) + (1 - e)E[r/f (Z; a)] = Mi(e). 

Q !>0 


( 101 ) 


Using fhe confinuify arguments of aj as in Theorem combined with ( |100[ ) and ( |101[ ), completes the proof. 






























48 


L. Proof of Proposition 
Define 

ra,p{(y'^) = IE(r?p(X + crZ; {aa/cpf~^) - Xf, 

where the expected value is with respect to two independent random variables X ~ (1 —e)Ao+eG and Z r\j iV(0,l). 
Cp is the constant introduced in Lemma a is a fixed positive number. Note that according to Lemma the 
thresholding policy (aa/cp)^"^ that is used in the definition of ra^p{(j‘^) ensures that r/p(n; {aa/cp)‘^~^) = 0 for 
|m| < aa and for every 0 < p < 1. Furthermore, note that ^ra,p{o''^) is equal to 'I'j; for the thresholding 

policy Aq,((t) = (aa/cp)‘^~P. We start with several lemmas that are important in the proof of Proposition]^ 

Lemma 22. For large values of a, we have 

Tq^P^CT ) ~ Fa^pO' , 

where 

r„,pAE(r? 2 (Z; (a/cpf-P)). 

Furthermore, Lq i < Fq^p for every 0 < p < 1 and a > 0. 

Proof: Let X denote a random variable with distribution (1 — e)(io + eC and t7 ~ G be another random variable. 
We have 


lim 


(J^^OQ (7 



lim 



(_a) 

lim 


CJ^^OO 

(1) 

(1-^: 

= 

(^p( 


(1 - e)E {ril{aZ-, {aalcpf + e&{pp{U + aZ\ {aajcpf ^) - U 




(1 - e)E (rjpiZ] {oi/cpf + eF.{pp{U/a + Z; {a/cpf ^)-Ula 


E(r?2(Z;(a/cp)2-n), 

where Equality (a) is according to Lemma To obtain Equality (b), we have assumed that the limit and expectation 
are interchangeable. The proof is similar to the proof we presented in Section VI-H and hence skipped. Eurthermore, 
according to Corollary 

rip{u; {a/cpf~P) > riliu] (a/ci)) V|rt| > q:, 0 ^ p < 1, 


rip{u;{a/cp) ^) = (n; (a/ci)) V|u| < a, 0 ^ p < 1. 


Hence, 


a,l 


<r 




□ 


We can employ this lemma to obtain the following result for the performance of the .^p-AMP with thresholding 
policy AQ(cr) = [aajCpf‘~'P. Although this lemma is not useful in our proof of Proposition]^ since this is an 
interesting application of the above lemma, we include it here. 


Corollary 6. Suppose T^ p < 6. Let denote the lowest fixed point of ip-AMP with thresholding policy Xa{cr) = 
ioio j Cpf‘~'^, where Cp is the constant introduced in Lemma^ and a is a fixed number. For large values of we 
have 

+ o(l), 


2 


(T; 


w 


1 _ Ir 


Proof: Eirst note that, according to the state evolution equation, af satisfies 


1 


0 -; 


2 2 I / 2 \ \ 2 


( 102 ) 
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Thus (t| 


oo, as cT^ —cx). Dividing both sides of the equation in (|102|) by ct|, combined with the result of 


Lemma 22 we have 


lim 4= lim 4= lim 1 _ ^ 

>■00 (7^ al^oo O"^ af^oo 0 (J^ 


— 1_r 

— -L a,p- 


□ 

We state another corollary of Lemma 22 that is important in our proof. Let ap^*(cj^) denote the value of a that 
minimizes rci,p((T^). If as a —>• oo, ra,p(o^ reaches its infimum, we set to infinity. 


Corollary 7. For every value of p, oo as 


oo. 


Proof: Suppose this is not true. Then there exists a sequence (Jn —>• oo, as n —)• oo such that ap'’*(cj^) ^ a* < oo. 


From the proof of Lemma 22 it is straightforward to confirm fhaf, 


lim 

n^oo 




= r, 


OL* ,p 


> 0. 


However, since ap{a^) is fhe opfimal fhresholding value, we know 

T'ar,P^^n) < Jim ra^p{al) = efi^, 


which is in confradicfion wifh (1031. 


(103) 


□ 


Lemma 23. Recall the notation Ppf,-) in If r]^{a; (ajcp)^ ^ and a < oo, then we have 


ra,i{cr‘^) < ra,pi(^^)- 


Proof: Define pa - h/cr, ac,p - {alcp)"^ p, ^p{pa + z\ac,p) = r]p{pa + z;ac,p) - m(ho- + z;ac,i). Nofe fhaf 
Cpi.ho' T z, o^c,p) — 0 for \p(j -f 2^1 ^ Oi, ^pi^ptj z, cXc,p^ P 0 for F z ^ rr, and ^pl^Pa T z, cxc,p} F 0 for 
Pa + z < —a. We have 

^ = (1 - e)Epp{Z-, ac,p) + eE{pp{pa- + Z; ac,p) - Paf 

— (1 efEppl^Z, OLc^p) T CPa T T Z, CXc,p} ‘I‘Pa'}'np{Pa T Z, Ckc,p)] 

= (1 - e)Epp{Z; ac,p) + epl + eE[{r]i{pa + Z] ac,i) - 2pa)m{ho- + Z] ac,i)] 

+eE[{2r]i{pa + Z] ac,i) ~ ‘^hcr){f,p{Pa + Z] Q;c,p))] + ^E(fp{pa + Z] ac,p))^, 

where fhe first equality is due to Lemma Note that 

^ = (1 - e)Erjl{Z]ac,i) + epl + eE[{r]i{pa + Z-,ac,i) - 2pa)pi{pa + Z-,acp)]. 

According to Corollary we have ??f(Z;ac,i) < rip{Z]ac,p) for every Z. Hence 

(1 - e)E?7p(Z;ac,p) > (1 - e)Eril{Z]ac,i). 

Combining ( |104[ ), ( |105| ), and ( |106| ), we conclude that if we prove 

E[{2r]i{pa + Z] ac,i) - 2pa)i^p{Pa + Z] ac,p))] + E{^p{pa + Z] ac,p)f > 0, 

then 

ra,iicr‘^) < r^picr"^)- 

Hence in the rest of the proof we focus on showing ( |107| ). First note that by employing corollary it is 
straightforward to conclude 

miha + Z; ac,i) ■ Cpiha + ac,p) > 0. (108) 

Furthermore, according to Lemma if \rip{u] X)\ > 0, then dpr]p{u]\) > 1. Since dir]i{u]\) = 1, we conclude 
that if \pa + Z\ > a, then 

\^p{ha F Z] Oic,p)\ F rjp [a] Oic^p)■ 


(104) 

(105) 

(106) 

(107) 
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Hence, ii\ncF + Z\ > a 

“1“ Z'jUcp)) 2/io-^p(/i(7 + Z](y.c,p) ^ |^p(/^(T “1“ Z\oic,p)\ ■ ij]p {.oi-'i Oic^p) ^/i^) > 0. (109) 

This completes the proof of our lemma. □ 

The following lemma enables us to complete the proof. 


Lemma 24 ( |42|, Proposition 3.8). ^rai(cr^) denotes the risk of the soft thresholding function, then for large 


values of a, 


da 


> 0 . 


See 1421 for the exact value of a above which the derivative is positive. 


Corollary 8. For every value of a, we have 


inf Va 
a>0 ’ 


The proof is a straightforward combination of Lemma 24 and the fact that lim^-^oo ra,p (o-2) = 

Now we return to the proof of Proposition We only mention the sketch of the proof since the details are 
straightforward. According to Corollary]^ for large values of a, Op’^ is large, hence = (ap^*/cp)^“P is large 
as well. Hence by Lemma we can assume that rj^ ■, a%^p) > ^ for large a. Suppose that < oo. Then 
according to Lemma 1^ we have 


infr„,i(a2) < r„opt_^(cj2) < 


If = oo, then 


infr„,i(a2) < r„opt_^(a2) = r, 

where the first inequality is due to Lemma [24| In any case, we have showed that. 


( 110 ) 


for large values of a. The last step of the proof is to connect this result with the fixed poinfs of fhe sfafe evolufion 
equation. 

Note that the fixed poinfs of fhe sfafe evolution must satisfy 

2 2,1 / 2 \ 

^ + ~^''"a'F\p^^ I- 

Therefore, as -d 00 , the lowest fixed poinf aj goes off fo 00 . On the other hand. Inequality ( |110| ) implies that 
the function + jr„°pqp(o'^) is above cr^ + j?'q,°p\i(o'^) over a range (d-^,oo). Hence, we can increase a^j to 
make sure aj of both £i-AMP and ^p-AMP fall into that range. Then clearly the lowest fixed poinf of ^i-AMP is 
smaller fhan that of £p-AMP. 


M. Proof of Theorem 

We first remind the reader the definition 

Rp{T, e)Er?2(Z; r) + eE(r?p(C//a + Z-r) - U/af, 

where Z ~ A^(0,1) and f7 ~ G are independent. Also let r*((T) denote the optimal r that minimizes Rp{T, a). In 
Proposition we proved that as cr —)■ 0 


-Rp(A(cr),cr) = e + ep^E|[/p^ + o((n{a)fa‘^ ^^), 


where the convergence rate of r*((T) can be characterized by 


2-2p 


_^ (l-e)cpr?^(cp;l) 

"-^^O(A(cT))^0(cp(n(a))^) ~ ep2(2_p)E|f7|2p-2- 


lim 


( 111 ) 
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Here we aim to analyze the lowest fixed point of the optimally tuned £p-AMP, cr^, that satisfies 


O': 






(112) 


We focus on fhe regime where the sparsity level e is helow the phase transition of the lowest fixed point, i.e. 
(5 > e. In Theorem we have already proved that 

(113) 


lim = 


>0 erf, 


S-€ 


Now we first characterize the following limit: 


lim 


5-e^w 




(a) cr2 - - ^Rp{n{ae), ae)) 

= hm - 


o-'. 


4-2p 




■ lim 


Rp{n{(Ti),ai) - e 


■ lim 


a 


(^) ep2E|H|2p-252-p 


6 — e (Ti^o (r*(cj£))2cj? at ^ 


(114) 


We have used ( |112| ) to obtain (a); the derivation of (h) is due to ( |113| ) and Proposition]^ Our next step is to show 
that 

aj - ^,al ep^E\B\^P-^5^-P 


lim 


^at-^P{nia^)y {6-er-P 


First note that we have proved in Lemma [T^ T^{aw) —)• oo as cr^ —)• 0. We can use ( |1 1 1| ) to prove that 
a*{ai)/T^:{aw) —)• 1 as cr^ —)■ 0 in the following way. According to Lemma 15 r*(iTi„),r*(fT£) —)■ oo. Furthermore, 
hy employing ( |1 1 1[ ) we obtain 


a, 


2-2p 


lim - 

(n(o-£))^</>(cp(r*(cr£))^) 

By applying (|113 1 we reach 


2p-l 1 

{n{a^)) (f>(cp(n(aw))^-p) _ ^ 


a. 


2-2p 


(n(aw)) </>(cp(n(cr^))^-p) ^ ^ ^ 




(n(a^)} 2-P (j){cp{n{ai))^-p) 


e\i-P 
6^ 


which implies 


^ - ^(t*(cJ^)2/(2 P) _p-^(o-^) 2/(2 p)) = (1 _ p) log(l - e/J). 

2—p r*(cr£j 2 

If we combine this with the fact that T^{ai) oo and T^{aw) —)• oo, then we conclude that 

tii log Sfe) - - T,{<,,)'l/(2-rt) 


lim 


that in turn implies that 

Combining ( |114[ ) and ( |115[ ) proves that 

lim 


A(fT^)2/(2-p) 


lim = 1. 


= 0 , 


<pj - 


ep2E|B|2p-2^2-p 


(115) 


^^(n((T^„))2 ((5-e)3p 

where T^{aw) satisfies ( |111[ ). It is then straightforward to use ( |111| ) and the fact that r*(cr^) —)• oo to show that 

(A(cr^))^ 4(1-p) 

lim —^^5 -. 

-7„^0 log -L c2 
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N. Proof of Theorem 

We first remind the reader the definition 

Ro{t, a) ^ (1 - e)Er/2(Z; r) + eE(r?o(C//cT + Z; r) - U/a)\ 

where Z ~ A^(0,1) and f7 ~ G are independent. Also let T^{a) denote the optimal r that minimizes Rq{t, a). In 
Proposition 1^ we proved that as cr —)• 0 if /U = sup^ju : P{\B\ > f) = 1} > 0, then for p = 0, 

Ro{nicr),a) = e + o{4>{ila~^)), 

where jl is any constant that smaller than 


The rest of the proof is similar to the proof of Theorem VI-M and is hence skipped. 


O. Proof of Theorem 10 


Let X and U denote two independent random variables with distributions (1 — e)Ao + eG and G, respectively. 
As before, our only assumption on G is that it does not have any point mass at zero. Note that satisfies the 
following fixed point equation: 

1 

(5 T> i) 


+ T inf 


(1 - e)Er?2(cr/jZ; r) + eE(r/p([/ + ct/,Z; r) - Uf 


a 


0 r>0 


< ^U^inf 


6 r>0 


0 r>0 


^2 + ^ inf 

6 r>0 


{1 - e)Er]p{Z-,Tal ‘^) + €Eu{Ez{r]p(U/ah + Z;Tal ‘^)-U/(Thf) 


(1 - e)E?7p(Z;TC7? ^) + esupE(7/p(/r/crft + Z; ra? ‘^)-plahY 


fi>0 


{1 - e)Er]p{Z-,Tal ‘^) + esupE{r]p{p + Z;Tal ^) - pf 

fi>0 


.rP-2\ 


(1 - e)E??2(2';r) + esupE(r/p(/r + Z;t) - pf 


M>0 


al + -PMp{e). 


The proof of the second part of the theorem consider the following definitions: 

= argmmE (e(r?p(q + z;r) -/i)^ + (1 - e)E(? 7 p(z;r))^) , 

/i* = argmaxE (e(77p(/i + + (1 - e)E(r 7 p( 2 ;n,;i))^) . 

Note for notational simplicity we have assumed the maximas and minimas are achieved. Also define 


(116) 


A 


(TZ 


1 - 


M(e) ■ 


We consider a distribution G that has a point mass at For this distribution we have 

^r.,p((o-D^) = ^1 + inf 


— + 


6 T>o . 


— ^w + 


{I - €)Epp{Z-,T{alY ) + eE{pp{p^X Z]T{alf ) - 
(1 - e)E7?p(Z;n,^J + eE{rip{p^ + Z;r*,^J - 

{<yl? = {<?■ 


2 I —p(^) / *',2 _ ( *^2 


(117) 


Hence, cr^ is a fixed point of the function. If it is an unstable fixed point we can use the argument presented for 


Proposition to show that there is another stable fixed point above (a 


* \ 2 
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P. Proof of Theorem 

Let X ~ (1 — e)Ao + eG, where G is an arbitrary distribution that does not have any mass at zero. Also let 
U ^ G denote a random variable. Then we have 


1 


0 0<p<l,A>0 


Mp{e)al 


where (a) follows similar arguments as in Section VI-0 Hence, we conclude that 

,.2 


<< 


crz 


This implies that if —)• 0, then 


l-Mp{e)/6' 

0. Moreover, it is straightforward to see that 


hm 4 = 


1 


From the proof of Theorem]^ we know that for every fixed p < 1, 


(118) 


lim 






Hence, it is straightforward to show that 


lim sup 

cr2-5.0 




.,p. 


(y 




< lim sup 




cr^ 


(119) 


Define r(cr^) = |E(E(A|V + aZ) — X)^. Since E(V|X + aZ) is the minimum mean square error estimator 


we have 


Hence, 


^A.,p,(c^) > r(cr2). 




(T—>-0 


CT^ 


O'—>0 CJ^ 


e 

5’ 


( 120 ) 


where the last equality is a combination of Theorems 5 and 8 in |43 
finishes fhe proof. Note that ( 119 l and ( 120[ ) together shows 


dcr2 


Combing ( |118[ ), ( |119 ) and ( 120| ) together 


= I. Then by using similar arguments 
as in Theorem 1^ we can show Uh is the unique stable fixed poinf when is small enough. Hence implicif funclion 
fheorem can be applied fo claim fhe confinuify of cj/i, as a funcfion of in a neighborhood of 0. 


VH. Optimal £p-AMP in practice 


A. Stein Unbiased Risk Estimate 


The opfimal f'p-AMP algorithm introduced in Section II-C employs the following thresholding policy: 


X*(a) G argminE 

A>0 


+ aZ;A) - Vp) 


where the expected value is with respect to both Z ~ A(0,1) and X ~ px- Note that A* is a function of both a and 
px- While it is possible to provide a good estimate of a, coming up with a good estimate of px is very challenging, 
if not impossible, in many applications. The question we would like to answer in this section is whether we can 
provide an accurate estimate of A*((t) without any knowledge of px in practice. Similar questions can be asked 
regarding the optimal choice of p, introduced in Section II-D[ or even the optimal choice of h introduced in Q. 
We answer these questions in this section. Our approach is motivated by Stein Unbiased Risk Estimate (SURE), 
that we briefly summarize here. Let Xo G and suppose that we observe x = Xo + p with p N{0,a^I). To 
estimate Xo we employ a denoiser V : —)■ M^. Can we estimate the risk of this denoiser, i.e.. 


r-D = E||D(x) - Xo\\p. 
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If the answer is affirmative, then the risk estimate can he employed for tuning the free parameters of the denoiser or 
in comparing different denoisers. Note that the main challenge for estimating the risk, rx>, is that Xo is not known. 
The following theorem due to Stein provides a simple way to find an unbiased estimafe of rx>- 


Lemma 25. |44| Let 'D{x) denote the denoiser. If D is weakly differentiable, then 

E\\V{x) - Xof/N = E\\V{x) - x\\l/N + + 2a^E{l'^ {VV{x) - 1))/N, 


( 121 ) 




where W{x) = 


dT>iv(x) \T 

’ dxN ' 


and 1 is an all one vector 


Note that the terms inside the expectation on the right hand side of ( |121| ) do not depend on Xq- This enables us 
to provide the following estimate of the risk function: 

rj, = \\V{x) - x\\l/N + + 2a^{l^{W{x) - 1))/N. 


According to Lemma 25 we have = E{rTi)- Hence, fx) provides an unbiased estimate of r^). SURE has been 
used elsewhere for model selection [451. Our next goal is to employ the idea of SURE for the £p-AMP algorithm. 


B. 


Ip-AMP and SURE 


Can SURE be employed to estimate A*((t) or p,^{a^) for the optimal f'p-AMP? As we discussed in Section VII-A 


SURE can be used for denoising problems in which the noise is Gaussian. Also as we discussed in Section I-B 
if we define u* = + x^ — Xq, then we can write x^ + = Xo + u*, where u* resembles iid Gaussian 

random vector in the asymptotic settings. Hence, at the intuitive level, we should be able to use SURE to estimate 
the optimal parameters of £p-AMP. This intuition is in fact valid and we formalize it below. We only consider the 
estimation of A*(cr). But, the approach can be extended to the tuning of the other parameters as well. 


Consider the iterations of Ip-KMP with the optimal thresholding policy, A*((t). If the algorithm starts at at = <to> 
then the first threshold is A*(iTo). Note that A*((To) is the value of A that minimizes limAr^oo ^\\Vp,h{xo + v^'i A) — 
Xolli- Once we run £p-AMP with this threshold, the standard deviation of the next iteration will be cri and hence the 
next threshold will be A((Ti), and again this is the value of A that minimizes limTv^oo jq\\i)p,h{xo + A) — Xo\\ 2 - 
This discussion reveals two main properties of the optimal thresholding policy: 

(i) We do not have to estimate the entire function A*(cr). We only need to estimate it at the values of do, cxi, cr 2 ,... 
that are actually observed in the £p-AMP algorithm. 

(ii) At iteration t, A{af) is the value of A that minimizes the risk limjv->-oo ^\\Vp,h{xo + A) — Xo\\ 2 - 
These two conclusions imply that if at iteration t we find the value of A that minimizes lim 7 v->^oo -^\\Vp,h{xo + 

u*; A) — Xolli’ then the resulting £p-AMP algorithm will perform the same as optimal-A Ip-KMP. Hence, the problem 
of finding fhe optimal thresholding policy for £p-AMP is simplified fo the problem of tuning the parameters of 
£p-AMP at a single iteration (without taking the other iterations into account). If the iterations of £p-AMP are given 
by: 


x^ = + x^~^;Xt), 

/ = y - Ax^ + +x^~^;Xt)) ■ ( 122 ) 

The optimal value of Xt at iteration t is the value of A that minimizes 

rp,hW - lim ^r\\Vp,h{xo + v^; A) - Xo\\l 

^ N—^oo iV 

Since we know that u* is almost Gaussian, inspired by Stein Unbiased Risk Estimate (SURE), we consider the 
following empirical estimate of the risk at iteration t: 


^p,h,\ 


V' 


2a? 


hp,h{xo + V ; X) - Xo - V \\ -a^ + —div{fipAxo + v ; A)), 


(123) 
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where div denotes the divergence of fjp^h and is defined as div{fip^h{xo + v^; X)) = 

following theorem confirms that in the asymptotic setting f* ^ ^ provides an accurate estimate of the risk function,i.e., 

lim 

N^oo 


Theorem 14. Let {xo{N), A{N),w{N)} denote a converging sequence. Let the parameters Ai, A 2 , ■ ■ •, Ai_i denote 
the threshold parameters of ip-AMP for the first f — 1 iterations and x^{N) and z^{N) denote the estimates of 
Ip-AMP according to ( 122j . Then, 


hm "=i- n%,h{X + atZ- A) - Xf 


N^oo 


(124) 


where at satisfies the following iteration: 

al = + {\VpAX + at-iZ; Xt-i) - . (125) 

Here the expected value is with respect to two independent random variables Z ~ A^(0,1) and X ~ px- a^ 
depends on the initialization of the algorithm. 


The proof of this result can he found in the Appendix. According to the above theorem the empirical risk provides 
an accurate estimate of E(i)p/i(X + atZ] A) — X^ for large values of N. Hence one can estimate the optimal value 
of At and pt in the following way: 

(At,pt) G argminf* . X. 

A,p 

Note that the empirical risk can he even employed for finding fhe opfimal value of fhe parameter h. However, to 
reduce the computational complexity, we set h automatically. This approach will he explained in the next section. 


C. Simulation Result 

In this section, we would like to compare our asymptotic results with the simulations that are performed at finite 
values of N. As we will present later, it turns out that our asymptotic results provide accurate predictions of the 
performance of the algorithm even for not too large sample sizes, such as N = 5000. Furthermore, we present the 
result of the tuning approach we proposed for the .(p-AMP algorithm and we show that the tuning approach we 
proposed based on SURE is in fact accurate even in medium problem sizes. 

1) State Evolution versus £p-AMP: In this section, the predictions given by the state evolution are compared 
with the performance of Monte Carlo simulations. For Monte Carlo simulations, the dimension of the sparse vector 
xq is set to = 5000 which is relatively large. The measurement matrix A is iid Gaussian distributed and the 
number of measurements is set to n = 1000, i.e., 6 = 0.2. There are 40 nonzero elements in xq. We run ("p-AMP 
for T = 30 iterations. We set the thresholding policy to A((t) = ra^ where r is a fixed number. The value of r 
may differ in different simulations and will be mentioned below each figure. The empirical MSE reported in the 
figures is the average of 100 Monte Carlo simulations. Finally h is set to h = at/N^^^ at iteration t. We have 
not optimized over the parameter h. We have empirically noticed that ^p-AMP with this choice of h has a good 
performance. Accurate analysis of the effect of h on the performance of £p-AMP is left for a future research. 

Figures [T4| and [T5| compare the result of Monte Carlo simulation with the SE when the nonzero elements of the 
sparse vector are ±1 equiprobable and Gaussian respectively. The bars show 95% confidence infervals. As can be 
seen from fhe figures, fhe empirical results are reasonably close to the theoretical result that we obtained from the 
state evolution. 


2 ) Optimal tuning of X: Our goal in this section is to show the accuracy of the parameter selection technique 
we proposed in Section VII-B for finite sample sizes. As we discussed in Section VII-B for the optimal tuning of 
Xt we can employ the following estimate: 


Xt G arg min — 
x>o N 


hp,h 


(x* + A^z^\\) - X* - A 




1^ - A + ^div(i7p,,i(x* + A'^z^; A)). 
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iteration iteration 


(a) 


(b) 



iteration iteration 


(C) (d) 

Fig. 14. Comparison of the theoretical prediction and the Monte Carlo simulation result for (a) p = 0, (h) p = 0.3, (c) p = 0.5 and (d) 
p = 0.8. In the four cases, parameter r is set as 0.2, 0.3, 0.5 and 1 in Figures (a), (b), (c), and (d) respectively. The nonzero elements of 
the sparse vector are ±1 equiprobable in this scenario. In Figures (b) and (c), the error of the recovery does not converge to 0 under the 
parameter setting of the scenario, but SE still provides an accurate prediction. 


This requires an estimate of at at every iteration. It is straightforward to use the results of 1311 and prove that as 


oo. 


/n —)• £7j. Hence, in our simulations we will employ the estimate 


/re. 


N 

In this section, we would like to show that even in moderately large sample sizes N = 5000, Xt is close enough to 
Aj introduced in ( [T4l ). Here is our simulation settings. The dimension of the sparse vector Xq is set to he = 5000. 
The elements of the measurement matrix A are iid Gaussian. The dimension of measurements is set to he re = 1000, 
i.e., 6 = 0.2. Xo has only 40 nonzero elements. The nonzero elements of the sparse vector are ±1 equiprohahle. 
The variance of the measurement noise is set to he = 0.01. We have run the simulations for p = 0, 0.3, 0.5,0.8. 
In each iteration, Xt = Ttaf is optimized hy using the empirical risk function. 

The risk and its SURE estimation in the third iteration are plotted against Tt in Figure [T^ As can he seen from 
the figure, the SURE estimate is reasonably close to the true value of the risk function. 


VHI. Conclusion and future work 

We have studied the performance of both ^p-regularized least squares (EPES) problem and approximate message 
passing (AMP) that aims to solve EPES. Employing the state evolution framework, we have derived conditions 
under which £p-AMP for 0 < p < 1 outperforms ^i-AMP. It turns out that in the noiseless setting if the algorithm 
is initialized properly, it can outperform ^i-AMP by a large margin. We applied the Replica method to connect our 
results to EPES. It turns out that, in the noiseless regime, the phase transitions of EPES are exactly the same for 
every p < 1. We also studied the performance of these algorithm in the presence of the measurement noise. We 
showed that for small values of measurement noise, p = 0 outperforms the other values of p. However, when the 
measurement noise is large, p = 1 outperforms the other values of p. 

There are many questions that we have left for future research. For instance, extensions of this approach to 
other types of structure, such as group-sparsity or low-rankness, is an open direction that needs to be explored in 
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Fig. 15. Comparison of the theoretical prediction and the Monte Carlo simulation result for (a) p = 0, (h) p = 0.3, (c) p = 0.5 and (d) 
p = 0.8. In the four cases, parameter r is respectively set as 0.18, 0.4, 0.5 and 1. The nonzero elements of the sparse vector are Gaussian 
distributed in this scenario. In Figures (a), (h) and (c), the error of the recovery does not converge to 0 under the parameter setting of the 
scenario, hut SE still gives a reasonably accurate prediction to the performance of the ^p-AMP algorithm. 


the future. Such research may shed some light on the henefit of non-convex penalties for these popular structures. 
Finally, it is not yet clear if it is possible to design algorithms that can find the global minima of LPLS in the 
asymptotic settings. As we have shown in this paper, below certain sparsity level, e*((5), message passing algorithms 
may recover the global minima of LPLS. But, beyond this level they may be trapped at a fixed point that is different 
from the global minima of LPLS. Unfortunately, e*{6) is much below the actual phase transition of LPLS for p < 1. 
Whether we can find an algorifhm fhaf is befter fhan message passing for fhese problems is a major open question 
that may have major impact in the field of compressed sensing. 


Appendix 


Our main objecfive in this section is to prove Theorem 14 We start with the following lemmas that will be used 
later in the proof. 

Lemma 26. Let / : M —M denote a differentiable function with bounded derivative, i.e., \f'{u)\ < M for every 
u. Then, 

|/(si + i^i) - /(so + i?o)| < \/2My/(si - so)2 + (bi - 


Proof: According to the mean value theorem, we have 

df{s + -d) df{s + b) 


|/(si +i?i) - /(so + i?o)| = 


ds 


db 




[Si - So,-!?! - 


(126) 


where is a point on the line that connects (so,i?o) and (si,i?i). By applying Cauchy-Schwartz inequality 


to ( |126| ) and using the fact that 
proof. 


ds 




= f{s*+r) and 




= f'{s* + b}*), we can finish fhe 

□ 
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Fig. 16. The actual risk function and its unbiased estimate (SURE) as a function of rt. We set (a) p = 0, (b) p = 0.3, (c) p = 0.5 and (d) 
p = 0.8. The figure is the result for the third iteration. The power of measurement noise is set to be = 0.01. 


We now turn to the proof of Theorem The proof employs Theorem 2 of 1311. This theorem confirms that 
under the conditions we presented in Theorem we have 


1 


N 


^ =’ EJ{atZ,Xo), 

N^oo iV ^ ' 


i=l 


where Z ~ A^(0,1) and X ~ px are two independent random variables, and J is a Lipschitz function of vj and 
To,Note that 


si _ ^ ll.cs I „.i. ™ „,i||2 


rp,h,\ = ^l|i?p,h(To + U ; A) - Xo - U II2 - 0 -i + — div(?7p,,,(xo + v ■ X)) 


N 


(127) 


If we prove that both \ fjp^h{xo,i + v\] A) — Xo,i — and fj^ i^{xo,i + vj; A) are Lipschitz functions of (xo,i, vj), then 
we can characterize the limit of f* ^ Hence as the next step, we prove these two quantities are Lipschitz. We 
proved in Section VI-C that r/p/^( m; A) is a differentiable function of u with sup„ |i?p /j(u; A)| bounded. Furthermore, 


it is straightforward to show that A) — u\ is bounded. Hence, A) — uf has a bounded derivative. 

If we combine this fact with Lemm a we know that |i)p,/i(xo,i + u*; A) — Xo,i — vj\‘^ is a Lipschitz function of 

ixo,i,vj). Hence, by Theorem 2 of |31| we conclude that 


1 ^ 

T7 \Vp,hi^o,i + Vf, A) - Xo,i - Vi\^ “=' E{fip h{X + atZ] X) - X - atZy 
N->-oo ^^ 


(128) 


1=1 


Moreover, according to the proof of Theorem |T| that we presented in Section VI-C we know ^(q A) is bounded 
and has finite discontinuity points. We can then apply the arguments for proving Lquation (4.11) in |[23| to conclude 


In fact, the result of Theorem 2 of |3l| considers more general pseudo-Lipschitz function J. 
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that 


2ai 


lim ^div(r?p,/i(xo + A)) “=' 2(T‘^¥.{fi' y^{X + atZ] A)) 

N^OO I\ ^ 

Hence, if we combine ( |127[ ), ( |128| ), and ( |129| ) we obtain 

lim Eifjp^hiX + atZ-, X) - X - atZf - + 2aMfip,hiX + atZ; A)). 

N^oo 

Finally, Lemma [25] (with N = 1) shows that 

nVp,h{X + atZ; X)-X- atZf - aj + 2aM%,h{^ + ^tZ', A)) = ^pxiX + atZ- A) - Xf 


(129) 
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